using gpu and power8 to explore how genomes fold -...
TRANSCRIPT
USING GPU AND POWER8
TO EXPLORE HOW
GENOMES FOLD
Ido Machol
Aiden Lab
Baylor College of Medicine
Rice University
GTC 2015
THE HUMAN GENOME
IS LONG!
…CGTTTACGAAAATCGCAAAACTTTCGATACCCATAGGCTACTGATCATACGACCGTTTACGAAAATCGAAACCTTTCCGATCTAGGCTAC…
3 BILLION Letters
2 METERS
Nucleus Cell
6 μm
10 bp
100 bp
1 Kb
10 Kb
100 Kb
1 Mb
10 Mb
100 Mb
SAME GENOME, DIFFERENT
FUNCTIONS
PART I:
TECHNOLOGY
MICROSCOPY &
FLUORESCENT IN SITU HYBRIDIZATION
FISH
CONTACT MAPPING
Exploring structure via proximity
4-11 (lives nearby)
0-3 (lives far away)
Always (same person)
Times in the Same Photo
FACEBOOK CONTACT MAP
Homer
Simpsons'
Contact
Map
# of Pictures Together
4 5 6 7 8 9 10 11 12 13 14
2 0 1 2 1 0 1 0 0
0 3 2 1 0 0 0 0 0
1 2 16 6 5 4 11 1 1
2 1 6 8 6 3 4 0 0
1 0 5 6 8 4 5 1 0
0 0 4 3 4 5 5 0 0
1 0 11 4 5 5 11 1 1
0 0 1 0 1 0 1 2 1
0 0 1 0 0 0 1 1 1 0 16
2 0 1 2 1 0 1 0 0
0 3 2 1 0 0 0 0 0
1 2 16 6 5 4 11 1 1
2 1 6 8 6 3 4 0 0
1 0 5 6 8 4 5 1 0
0 0 4 3 4 5 5 0 0
1 0 11 4 5 5 11 1 1
0 0 1 0 1 0 1 2 1
0 0 1 0 0 0 1 1 1
Hi-C
3D Genome Sequencing
Hi-C: genome-wide Chromosome
Conformation Capture
Erez Lieberman-Aiden, Nynke van Berkum
et al. Science 2009
Computational Challenge I
Alignment, calculate contacts
…CTGCCTCCTCGCGG CCGCGTGGTGGCAG…
DNA Reference
Sequence
Align to reference genome
… …
Alignment is not trivial
…CTGCC_TCCTCGCGG…
…CTGC__TCCTCGCGG… …CTGAA_TCCTCGCGG… …CTGCCCTCCTCGCGG…
Substitution
Deletion
Insertion
Computational HW and SW setup
8 x Power8 Servers
2 Sockets x 12 cores x 8 threads = 192 virtual cores each
Total of 1,536 virtual cores in cluster.
• 4 X 256GB RAM
• 2 X 1024GB RAM
• 2 X 256GB RAM with NVIDIA K40 Tesla
Model 8247-22L and 8247-42L
Byte order: BI-Endian
Rice RSCG PowerOmics
hardware
Tesla K40
Stream Processors 2880
Core Clock 745MHz
Boost Clock(s) 810MHz, 875MHz
Memory Clock 6GHz GDDR5
VRAM 12GB
Single Precision 4.29 TFLOPS
Double Precision 1.43 TFLOPS (1/3)
GPUs
Storage
• IBM GPFS Storage Server (Model 24)
• 4 X JBOD
• Total of 361 TB fast scratch disk space
• (Up to 1.4 Peta bytes)
• FlashSystem 840 20TB Flash
Interconnect:
• 56 Gigabit 36-port FDR IB switch
• Mellanox Next gen Connect-IB FDR Host Channel Adapters
• 10-Gigabit Ethernet
• Internet 2
Interconnect
Rice RSCG PowerOmics
software
Cluster management
• IBM Platform LSF, PPM, PAC, PowerKVM 2.1.0
Operating system
• Ubuntu 14.4 (little-endian) + Red Hat Enterprise Linux 7.0
Storage
• Mellanox OFED 2.4-1
• GPFS 4.1
Scientific
• BioBuilds 2014.11
Challenge -
Alignment of billions of contacts
High Resolution Map 13 billion reads forming 5 billion contacts in the map
IBM Power8 Cluster 675 read alignments / second / CPU core
192 cores
About 27 hours
…CTGCCTCCTCGCGG…
Chromosome
Hi-C
GENERATES GENOME-
WIDE CONTACT MAPS
Genome
Genome
Hi-C
GENERATES GENOME-
WIDE CONTACT MAPS
Genome Chromosome 8
Hi-C
GENERATES GENOME-
WIDE CONTACT MAPS
0 700
Reads/250 kb2
A
A
Hi-C
GENERATES GENOME-
WIDE CONTACT MAPS
0 700
Reads/250 kb2
A B
A
B
Hi-C
GENERATES GENOME-
WIDE CONTACT MAPS
0 700
Reads/250 kb2
PART II:
BIOLOGY
Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome Erez Lieberman-Aiden, Nynke van Berkum et al. Science 2009 Science, 2009
Genomic analysis of compartments
Genes
Chromosome 14 Mb2 Pixels 1
The two compartments correlate strongly with open and closed chromatin
kb2 Pixels 100
The whole genome is plaid
1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16
17 18 19 20 21 22 X
A TOUR OF THE NUCLEUS
Organization
observed at three distinct scales
NUCLEAR SCALE
100Mb
CHROMOSOME SCALE MEGABASE SCALE
10Mb 1Mb
Organization
observed at three distinct scales
NUCLEAR SCALE
100Mb
CHROMOSOME SCALE MEGABASE SCALE
10Mb 1Mb
Organization
observed at three distinct scales
NUCLEAR SCALE
100Mb
CHROMOSOME SCALE MEGABASE SCALE
10Mb 1Mb
A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping Suhas Rao*, Miriam Huntley*, Neva Durand, Elena Stamenova, Ivan Bochkov, James Robinson, Adrian Sanborn, Ido Machol, Arina Omer, Eric Lander, Erez Lieberman Aiden Cell 2014
5 b
illio
n c
on
tact
s
30
mill
ion
co
nta
cts
More Contacts, Higher Resolution
Detection of Chromatin Loops Genome-
wide via Hi-C
A
A-2ε A-ε
A+ε
A+2ε
B-ε
B-2ε
B
B+ε B+2ε
Into the loops
L3 L2 L1
L1 L2 L3
Computational Challenge III
Loop calling
Which one shows a loop?
X
✔
X
3D Map Features
X
Computational Challenge III
Loop calling
• Apply 4 filters for each pixel.
• 20 Giga pixel image.
• Millions of parallel filters.
NVIDIA Tesla GPU 200x faster than previous CPU implementation – from 3 weeks to 3 hours.
10,000 Loops in the Human Genome
Loops turn genes on and off
Lung fibroblast cell Lymphoblastoid cell
SUMMARY OF
COMPUTATIONAL
EFFORTS
Sequence alignment
proportions
Genome data production and analysis
• In about 36 months we produced sequence equivalent of more than 2200x coverage of the human genome.
• For reference, the Human Genome Project produced 12.6x coverage, over the span of 4 years.
Storage
• We currently have 25 TB of RAW sequenced data
• We sequence 1 TB each month.
• After processing the raw sequenced data, we store 3 TB of Raw and processed data.
Computational speed up
Cluster processing
• We produce 1 Billion reads per month.
• Power8 is capable of processing alignments at 675 reads/second per CPU core.
• 50% faster then the cluster system we were using before.
• At this speed, we consume about 17 “CPU days” per month.
• With power8 cluster having over 192 cores, the jobs complete processing in about 2 hours.
GPU processing
• Using NVIDIA Tesla K40, we run our loop calling algorithm over a 20Giga pixel map 200x faster than CPU implementation.
• Instead of 3 weeks we get the work done in only 3 hours.
aidenlab.org/juicebox
Aiden Lab
Erez Lieberman Aiden
Suhas Rao
Miriam Huntley
Neva C Durand
Elena Stamenova
Adrian Sanborn
Arina Omer
Ivan Bochkov
Olga Dudchenko
Robert Nnake
Su-Chen Huang
Muhammad Shamim
Chris Lui
Sarah Nyquist
Sanjit Batra
Ashok Cutkosky
Najeeb Tarazi
Jian Li
Broad Institute
Eric Lander
Jim Robinson
GREETINGS FROM
ANOTHER DIMENSION