xin zhou - saturday closing plenary
Post on 25-May-2015
894 Views
Preview:
TRANSCRIPT
Taxon diversity analysis for bulk insect samples using Illumina Hi-seq platform
Xin ZHOU, Shanlin LIU, Yiyuan LI,
Qing YANG, and Xu SU
Department of Science and Technology
Environmental Genomics Research Group
BGI, China
Adelaide, Australia, 3 December 2011
Opt.1: ......zzzzZZZZZ
Opt.2: morph sorting indiv. ID … Opt.1
Opt.3: morph sorting indiv. barcoding … Opt.1
Opt.4: grinding up NGS CLUSTERING/BLAST DIVERSITY!
Problem Solutions?
Zhou et al. 2011, 4th International Barcode of Life Conference
Environmental barcoding of bulk insects
Zhou et al. 2011, 4th International Barcode of Life Conference
aquatic insects mini-barcode (130bp) 454
bat diet (insects) COI fragment, 157 bp 454
Biodiversity soup: metabarcoding of arthropods for rapid biodiversity assessment and biomonitoring, Yu D.W. et.al., in review
Malaise trap (insects) COI fragment, ~400 bp 454
NGS platforms Read length
Data/run(GB) Run time
Requirement of library
construction
454 platform(GS FLX Titanium XL+) ~400bp 0.7 23 hr. Yes
Illumina platform(Hi-Seq 2000)
150bp PE reads 600 14 d. Yes
Illumina platform(Mi-Seq)
150bp PE reads 2 27 hr. Yes
Ion Torrent 200bp ~1 3.5 hr. No
Major NGS platforms applicable in environmental barcoding
Zhou et al. 2011, 4th International Barcode of Life Conference
higher through-put less $ / bp increasing reading length variety of bioinformatics tools available from genomic
pipelines
Illumina Hi-Seq
• 28 Illumina GAIIx• 137 Illumina Hi-Seq2000• 25 Life Tech
SOLiD 4• 16 ABI 3730XL • 110 MegaBACEs• 2 Illumina iScan• 1 Roche 454• 1 Ion Torrent• 1 Illumina Mi-Seq
Sequencing capacity at BGI
Data production:• 100 Gb / day (2009)• >5 Tb / day (end of 2010)• >1500X human genome / day
Zhou et al. 2011, 4th International Barcode of Life Conference
What I am NOT going to talk about:
• Primer optimization
• Systematic comparisons of NGS platforms
• Quantitative diversity analysis
What I AM going to talk about:
• Can Illumina NGS be used in diversity analysis?
Zhou et al. 2011, 4th International Barcode of Life Conference
Sequencing error rate
Read-length
Can Illumina NGS be used in diversity analysis?
Zhou et al. 2011, 4th International Barcode of Life Conference
Recent improvement in sequencing quality using Illumina’s V3 chemical
(even at 100 bp, only about 10% of the base callings has error rate >1%)
Zhou et al. 2011, 4th International Barcode of Life Conference
No indel issue in homopolymers
Sequencing quality keeps increasing
Rare nucleotide error can be easily
corrected by:
increasing sequencing depth
pair-end (PE) sequencing
setting stringent matching criteria in
the overlapping fragment by allowing
only >99% identity
Sequencing error rate
Insert-size250nt
150bp
150bp
PE sequencing enables forming sequence contigs
Zhou et al. 2011, 4th International Barcode of Life Conference
Read length keeps increasing
Short-gun reads can be further assembled
into longer fragments (“short-gun”
assembly
strategy used in genome sequencing
projects)
Read length
Insert-size250nt
150bp
150bp
150PE enables contig read of 250bp
Option of scaffold assembly
Illumina environmental barcoding
COI amplicons shotgun PE sequencing
Full length COI barcode PE sequencing
PCR based
Full length COI
PCR free
Full length COI without PCR bias
Mitochondrial shotgun PE sequencing
Illumina e-barcoding
Lib1 (658bp, 150PE) Lib2 (200bp, 150PE)
Zhou et al. 2011, 4th International Barcode of Life Conference
Sample information
Mock XSBN (provided by Yu et al.)
# Specimens 23 292
# Haplotypes (2%) 12 230
Soup protocol DNA extracted individually and mixed for PCR
PCR primers LepF1/LepR1 Customized
Sequence length 658 bp 700 bp
Sequencing library details Full length (658bp) + Short-gun library (~200bp)
Sequencing protocol 150PE
Zhou et al. 2011, 4th International Barcode of Life Conference
Approach #1: PCR-based
Lib 1 Mock XSBN
Raw data 1.67G 4.04GFiltering adapter 1.60G 1.28G
High quality (Q20)
0.35G 0.50G
# Reads (Primer removed)
1,081,997 1,150,477
# Unique reads (Abundance > 1)
36,618 45,444
Zhou et al. 2011, 4th International Barcode of Life Conference
Pre-analysis data filtering
Approach #1: PCR-based
Unique reads (abundance > 1)
OTU cluster (98%)
Remove Chimera
Compared to reads of Lib 2
Mock 36,618 784 490 119 44
XSBN 45,444 4,189 3887 403 399
OTU filtering workflow
Alignment
Zhou et al. 2011, 4th International Barcode of Life Conference
Results
Mock 84 36
XSBN 19832 197
Sanger Reference
NGS OTUsBlast at 100% identity
Zhou et al. 2011, 4th International Barcode of Life Conference
LepF1/R1
Customized primers
Mock
84 36
31 can be found in our total sample, from which our mock samples were assembledNot found in raw
data (likely due to primer failure)
5 likely to be PCR errors
Sanger Reference
NGS OTUs
Zhou et al. 2011, 4th International Barcode of Life Conference
False negative“False positive”?
XSBN
19832 197
17 not found in raw data (primer failure)
Mea
n +
SE
15 were lost in data filtering
Cross-sample contamination?
Zhou et al. 2011, 4th International Barcode of Life Conference
(group1) (group2)
Sanger Reference
NGS OTUs
18149 84
after removal of sequences with abundance <10
Significantly less false positives
Slight drop of true positives
Zhou et al. 2011, 4th International Barcode of Life Conference
19832 197
Sanger Reference
NGS OTUs
What’s next?
Zhou et al. 2011, 4th International Barcode of Life Conference
Obtaining full-length barcodes via short-gun reads assembly
(new program in development – “SOAPbarcode”)
New algorithm to filter out false positive OTUs
Approach #1: PCR-based
Illuminae-barcoding
Approach #2: PCR-free method
Zhou et al. 2011, 4th International Barcode of Life Conference
Individual barcoding
Total MT isolation&
DNA extraction
Shotgun sequencing
Reference
based methodReference
independent method
Building reference library: individual barcoding
1. 89 individuals;2. 84 reference barcodes;3. 39 OTUs (2%);
Taxon group # OTUs
Lepidoptera 25Diptera 7
Hemiptera 4Hymenoptera 2Psocoptera 1
Total 39
Zhou et al. 2011, 4th International Barcode of Life Conference
Total MT isolation & DNA extraction
Sample
mixture
Total MT
isolation
MT DNA extraction
Zhou et al. 2011, 4th International Barcode of Life Conference
Shotgun sequencing
Percentage of base pairs
Q20 (Sequencing error rate < 1%) 96.2%
Q30 (Sequencing error rate < 0.1%) 92.9%
GC content 38.0%
Insert size: 200bp;Read length: 100bp PE;
Zhou et al. 2011, 4th International Barcode of Life Conference
Pre-analysis
Raw data 2.45G
After filtering 2.20GRatio of high
quality reads 89.91%
Data filtering:1. Adaptor contamination removal;2. Quality control:
in each read, only allowing <10bp with seq. error rate >1%
Zhou et al. 2011, 4th International Barcode of Life Conference
Taxon groups # OTUs
Lepidoptera 20Diptera 2
Hemiptera 3Psocoptera 1
Total 26Not found 13
Method 1: Reference basedBlast reads to reference barcodes, confident identification is made only when:1. Best BLAST hit >98% identity;2. Reference coverage > 90%;
Reference 1
Reference 2
Correct mapping
Incorrect mapping
Coverage: 100%
Coverage: 30%
Approach #2: PCR-free method
Zhou et al. 2011, 4th International Barcode of Life Conference
Potential sources of failure in detecting taxa
?Taxon specific
orBio-mass
(size & number)
Zhou et al. 2011, 4th International Barcode of Life Conference
Taxon bias?
Failures in taxon detection
Taxon groups undetected
# Total OTUs
# OTUs missing
Lepidoptera 25 5Diptera 7 5
Hymenoptera 2 2Hemiptera 4 1Psocoptera 1 0
Total 39 13
Zhou et al. 2011, 4th International Barcode of Life Conference
OR bio-mass (body size, # individuals)?
Failures in taxon detection
Readily detectedAverage length> 5mm
MissingAverage length < 5mm
Zhou et al. 2011, 4th International Barcode of Life Conference
1. Assembly of COI gene using genome assembly program (SOAPdenovo);
2. Annotation using ~240 MT genomes downloaded from Genbank;
Method 2: Reference independent
Approach #2: PCR-free method
Zhou et al. 2011, 4th International Barcode of Life Conference
(Will we be able to identify diversity without reference MT genomes for the targeted species?)
Workflow:
PCR-Free reference-independent: results
23/31 falling in standard COI barcode region (mostly >600 bp);
1 of 23 is not in our reference barcodes;(Insecta; Lepidoptera; Pyralidae);
Multiple genes obtained simultaneously;1 nearly complete mitochondrial genome (~15k bp);3 fragments >6000 bp;
Zhou et al. 2011, 4th International Barcode of Life Conference
23/31 falling in standard COI barcode region (mostly >600 bp);
1 of 23 was not presented in our reference barcodes;(Insecta; Lepidoptera; Pyralidae);
Reference independent
Barcode references39 OTUs (84 individuals)
References based26 OTUs
References independent23 OTUs
Number of individuals we collected89 individuals
3 OTUs not detected in reference independent method because:
(1) sequencing depth is too low (<10X) to allow for reliable assembly
(2) relatively small body-size
5 individuals failed in Sanger sequencing
Zhou et al. 2011, 4th International Barcode of Life Conference
Gene NumberATP6 29ATP8 4COX1 31COX2 33COX3 31CYTB 31ND1 35ND2 34ND3 24ND4 30
ND4L 16ND5 30ND6 24
PCR-free method
Multiple MT genes obtained simultaneously
Zhou et al. 2011, 4th International Barcode of Life Conference
PCR-free method
1 nearly complete mitochondrial genome (~15k bp);3 fragments longer than 6k bp;
Barcode regionZhou et al. 2011, 4th International Barcode of Life Conference
What’s next?
1. Wet-lab protocol optimization Pre-sorting insects by body-size Alternative MT isolation methods
2. Increase sequencing depth
MT DNA 5-10% after isolation; Non-targeting DNA affects MT assembly (e.g.,
bacteria & genomic DNA); Taxonomic/biomass bias
Currently:
Potential solutions:
Approach #2: PCR-free method
Zhou et al. 2011, 4th International Barcode of Life Conference
Conclusions Illumina Hi-Seq delivers compatible performance
as other NGS platforms in analyzing bulk insect samples, with potential advantages in achieving higher sensitivity at lower cost;
Deep sequencing capacity enables a novel PCR-free approach, which may eventually solve biases caused by DNA amplification;
It shares issues with other NGS platforms (non-quantitative, inflation of OTUs, etc.)
Methodology optimization is much needed in many details of the pipeline;
Collaborative and synergistic efforts made by the community would greatly advance the progress.
Zhou et al. 2011, 4th International Barcode of Life Conference
Acknowledgements
Douglas W. YuKunming Institute of Zoology, Chinese Academy of Sciences
Mehrdad Hajibabaei, Shadi ShokrallaUniversity of Guelph
Owain EdwardsCSIRO Ecosystem Sciences
LU JianliangWU QiongAN SainanZHOU YizhuangZHAO Jing
Collaborators:
Zhou et al. 2011, 4th International Barcode of Life Conference
Funder:
36
Thanks for your attention!
Zhou et al. 2011, 4th International Barcode of Life Conference
top related