genboree microbiome workbench 16s workshop part i march 11 th, 2014 julia cope emily hollister kevin...
TRANSCRIPT
![Page 1: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/1.jpg)
Genboree Microbiome Workbench 16S Workshop Part I
March 11th, 2014Julia Cope
Emily HollisterKevin Riehle
![Page 2: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/2.jpg)
Genboree Workflow• Create Group• Create Database• Create Project• Upload Files • Create Samples (Sample Import using metadata file) • Link Samples to Sequence Files (Sample File Linker) • QC and Attach Sequences (Sequence Import) • QIIME • RDP
![Page 3: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/3.jpg)
Data Analysis - QIIME
• How to select samples for analysis• Chimera removal and why you should be
thinking about it• Output– downloading and organization– making sense of the files
![Page 4: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/4.jpg)
Data Analysis - QIIME
• How to select samples for analysis
![Page 5: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/5.jpg)
Data Analysis - QIIME
– Selecting samples for analysis• INPUT = One or more Sequence Import folders
– All should be of the same variable region; ideally produced with the same primer and sequencing direction
• OUTPUT Targets = Your database (required), your project (optional)
![Page 6: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/6.jpg)
Data Analysis - QIIME
Caveats:• All samples in your input folder will be analyzed
– This includes no-template controls and positive controls– The % variation explained by you PCoA may be influenced by the
inclusion of these samples• QIIME on Genboree is not currently set up to allow users to subsample
their data– This can be problematic if sequencing depth varies substantially
across samples– It does however perform a “rounding up” normalization step
![Page 7: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/7.jpg)
A bit about sequencing depthHow deep should you go?
There is no good answer
Strong biological patterns can be detected with low sequencing depth
– 10s to 100s of sequences can sometimes be enough
– 1000s tend to be the norm
Subtle biological patterns tend to require greater sequencing depth for detection
Sequencing depth can be dictated by:– Sample quality– The number of samples placed on a run– Project budget
Kuczynzski et al. 2010 Nature Methods 7: 813-819
![Page 8: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/8.jpg)
Unequal sequencing depthWhat’s the problem?
http://www.cs.unc.edu/~lguan/Research.files/backgroundSubtractionResult.JPG
Being certain that you are seeing the full view (…or at least equivalent glimpses of the) of your communities
![Page 9: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/9.jpg)
Unequal sequencing depthWhat’s the problem?
Unequal depthAvg Red = 5995 seqsAvg Blue = 11672 seqs
Same data setSampled are coloredby library sizeRed ~4000Orange ~5000Yellow ~6000Green 8,000-10,000Blues 11,000-17,000
![Page 10: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/10.jpg)
Unequal sequencing depthWhat’s the problem?
Unequal depthAvg Red = 5995 seqsAvg Blue = 11672 seqs
Equal depthAll libraries weresub-sampled to ~4000 reads.
![Page 11: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/11.jpg)
Data Analysis - QIIME
• Chimera removal and why you should be thinking about it– What is a chimeric sequence?– How frequently do they occur?– An example from real data– Why should you think about chimeras?– How to screen for chimeras using Genboree
![Page 12: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/12.jpg)
What is a Chimeric Sequence? – In Greek mythology:
• A creature that was an amalgam of multiple animals
• Body of a lion, head of a goat, tail resembling a snake
– In your sequence data:• The combination of multiple sequences
during PCR to create a hybrid
– In sequence databases:• A not-so-small nightmare of junk data• Mis-annotation• Enhanced “discovery” of novel organisms
Chimera generation figure from: Haas et al. 2011, Genome Research 21:494-504
![Page 13: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/13.jpg)
How frequently do chimeras occur?
– Schloss et al 2011:• With mock communities of known
composition:• ~8% of raw sequences were chimeric• Incidence increased with sequencing depth
– Approaches for detection:• Multiple algorithms available• Genboree uses ChimeraSlayer
– How it works:• The ends of each read (~30% of total length)
are compared to a chimera-free reference database
• Potential “parent” sequences are identified• Identity of potential chimera to in silico
chimera evaluated
Schloss et al. 2011 PLoS ONE 6(12):e27310
AATCGCGACCTGTTTAACCGTAGGTC
AATCGCGACCTGTTTAACCGTAGGTC
AAACGCTTACGGAGCTACACGAGTC
Query
Parent 1
Parent 2
AATCGCGACCTGTGCTACACGGGTA
AATCGCGACCTGTTTAACCGTAGGTC
AAACGCTTACGGAGCTACACGGGTA
Query
Parent 1
Parent 2
Likely Chimera
Non-chimera
![Page 14: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/14.jpg)
An example from real data
Chimeric alignment from: Haas et al. 2011, Genome Research 21:494-504
Alignment of chimeric sequences derived from Streptococcus (top, red) and Staphylococcus (bottom, black) Sequences were generated from 4 replicate PCR reactions/454 runs of V3V5 sequence
![Page 15: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/15.jpg)
Why should you think about chimeras?
– Spurious results• Artificially increases estimates of richness and
diversity• You may discover a “new” (but fake) species
– Should you trust all flagged chimeras?• Most people do but….buyer beware• False-positive rates are in the 1-4% range• Some taxa are poorly represented in reference
databases• Prevotella and Acinetobacter are known to produce
false-positive results in ChimeraSlayer
– How to verify (digging in to your QIIME output)• Obtain representative sequence(s) and verify their
identity (e.g., BLAST vs. NCBI nt database, RDP SeqMatch)
Sogin et al 2006 PNAS 103:12115-12120
![Page 16: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/16.jpg)
How to screen chimeras in Genboree
– Run a QIIME job• INPUT = Sequence Import folder• OUTPUT Targets = Your database (required), your project (optional)
![Page 17: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/17.jpg)
How to screen chimeras in Genboree
– Select “Remove Chimeras” in the Tool Settings dialogue box• Provide a study name• Provide a job name (TIP: add chimeras_removed to you job name so that
your output reflects that you selected this option)• Click SUBMIT
![Page 18: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/18.jpg)
Data Analysis - QIIME
• Output– downloading and organization– making sense of the files
![Page 19: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/19.jpg)
How do I get my files out?
– Entire folders can be archived/downloaded• INPUT = Folder to be archived• OUTPUT = Database to house archive
![Page 20: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/20.jpg)
How do I get my files out?
– Entire folders can be archived/downloaded• Provide and archive name• Choose your compression type• Decide if you want the directory structure to be preserved• SUBMIT
![Page 21: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/21.jpg)
How do I get my files out?
– Single files, including archives, can be downloaded one by one• Click on your file of interest in the DATA SELECTOR window• Click on the “Click to Download File” link in the DETAILS window• Save the file to your computer or storage drive• Most file types will require decompression
![Page 22: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/22.jpg)
QIIME – making sense of the files
– fasta.result.tar.gz– jobFile.json– mapping.txt– otu.table– phylogenetic.result.tar.gz– plots.result.tar.gz– raw.results.tar.gz– repr_set.fasta.ignore– sample.metadata– settings.json– taxonomy.result.tar.gz
![Page 23: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/23.jpg)
QIIME – making sense of the files– fasta.result.tar.gz: multiple sequence alignment of your representative sequences file.
Rep seqs = representative sequence for each OTU.
– jobFile.json: a log of the settings used by Genboree to run your analysis
– mapping.txt: a QIIME-compatible metadata file, includes barcode information
– otu.table: a spreadsheet of OTU by sample distributions
– phylogenetic.result.tar.gz: a phylogenetic tree of your rep seqs, additional files required for iTOL
– plots.result.tar.gz: figures, html files for all PCoA plots produced in your QIIME run
– raw.results.tar.gz: mapping file, otu table, rep seqs file, distance matrices underlying all PCoA calculations
– repr_set.fasta.ignore: RDP classification (with confidence scores) of each rep seq
– sample.metadata: like the mapping.txt file, with additional file locations for Genboree
– settings.json: similar to the jobFile.json file
– taxonomy.result.tar.gz: taxonomic summaries (per sample, at the Kingdom, Phylum, Class, Order, Family, and Genus levels)
![Page 24: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/24.jpg)
Genboree Workflow• Create Group• Create Database• Create Project• Upload Files • Create Samples (Sample Import using metadata file) • Link Samples to Sequence Files (Sample File Linker) • QC and Attach Sequences (Sequence Import) • QIIME • RDP
![Page 25: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/25.jpg)
Data Analysis - RDP
• How to select samples• Output– Downloading and organization– making sense of the files
![Page 26: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/26.jpg)
Data Analysis - RDP
– Selecting samples for analysis• INPUT = One or more Sequence Import folders
– All should be of the same variable region; ideally produced with the same primer and sequencing direction
• OUTPUT Targets = Your database (required), your project (optional)
![Page 27: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/27.jpg)
Data Analysis - RDP
Caveats:• All samples in your input folder will be analyzed
– This includes no-template controls and positive controls
• RDP on Genboree does not pre-filter for chimeric sequences
• RDP on Genboree is not currently set up to allow users to subsample their data– Depending on your application, this may be problematic if sequencing
depth varies substantially across samples– It does however perform a “rounding up” normalization step and
presents data on a relative abundance basis
![Page 28: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/28.jpg)
How do I get my files out?
– Entire folders can be archived/downloaded• INPUT = Folder to be archived• OUTPUT = Database to house archive
![Page 29: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/29.jpg)
How do I get my files out?
– Entire folders can be archived/downloaded• Provide and archive name• Choose your compression type• Decide if you want the directory structure to be preserved• SUBMIT
![Page 30: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/30.jpg)
How do I get my files out?
– Single files, including archives, can be downloaded one by one• Click on your file of interest in the DATA SELECTOR window• Click on the “Click to Download File” link in the DETAILS window• Save the file to your computer or storage drive• Most file types will require decompression
![Page 31: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/31.jpg)
RDP – making sense of the files
– domain.result.tar.gz– phylum.result.tar.gz– class.result.tar.gz– order.result.tar.gz– family.result.tar.gz– genus.result.tar.gz– sample.metadata– settings.json– count.result.tar.gz– count.xlsx– count_normalized.xlsx– weighted.xlsx– weighted_normalized.xlsx– png.result.tar.gz
![Page 32: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/32.jpg)
RDP – making sense of the files
– domain.result.tar.gz– phylum.result.tar.gz– class.result.tar.gz– order.result.tar.gz– family.result.tar.gz– genus.result.tar.gz– sample.metadata– settings.json– count.xlsx– count_normalized.xlsx– weighted.xlsx– weighted_normalized.xlsx– png.result.tar.gz
Per sample summaries at various taxonomic levels, including raw counts and weighted values
Per sample summaries at various taxonomic levels, raw counts or relative abundances (normalized)
All of the plots produced during your run (e.g., heatmaps, stacked bar graphs)
Per sample summaries at various taxonomic levels, weighted by confidence of ID assignments (raw counts or normalized)
![Page 33: Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle](https://reader038.vdocuments.pub/reader038/viewer/2022110321/56649cfa5503460f949cbd3b/html5/thumbnails/33.jpg)
Individual Time
• Confirm user accounts are created.• Confirm users know where mock data or their
data set are.