toast 2015 qiime_talk2
TRANSCRIPT
QIIME: Quantitative Insights Into Microbial Ecology (part
2) Thomas JeffriesFederico M. Lauro
Grazia Marina Quero Tiziano Minuzzo
The Omics Analysis Sydney Tutorial
Australian Museum 23rd-24th February 2015
Recap
• Rarefied, Chimera Filtered O.T.U. table:
Summarize Taxonomy
Use for β-Diversity e.g. Bray-Curtis clustering in PRIMER
• Phylogenetic tree
Use for phylogenetic β-Diversity e.g. UniFrac
Visualizing diversity 1 – community composition
• Summarizes taxa at hierarchical taxonomic levels:
summarize_taxa_through_plots.py -i otu_table_even146.biom -o wf_taxa_summary –m my_mapping_file.txt
Hands on – community composition
What taxa are present?
summarize_taxa_through_plots.py -i moving_pictures_tutorial-1.8.0/illumina/otus_denovo/otu_table_even138.biom -o moving_pictures_tutorial-1.8.0/illumina/otus_denovo/wf_taxa_summary -m moving_pictures_tutorial-1.8.0/illumina/combined_mapping_file.txt
Visualizing diversity 1 – community composition
• Input your final O.T.U table and your mapping file
• Summarizes taxa relative abundance at hierarchical taxonomic levels: Linnaean (K,P,C,O,F,G,S) (makes spreadsheets)
• Can open in Excel, R, PRIMER e.t.c. and do what you want with them
summarize_taxa.py -i otu_table_even.biom -o /taxa –m my_mapping_file.txt
• β-diversity compares diversity between each sample in your study
• i.e. make a matrix of overall similarity between each sample which can be visualized – what diversity is shared?
• Abundance based metrics e.g. Bray-Curtis (differences in rank-abundance of taxa) I generally use in PRIMER .e.t.c but can be done in QIIME
Visualizing diversity 2 – phylogenetic beta-diversity
Visualizing diversity 2 – phylogenetic beta-diversity
• Divergence-based measures: communities are considered more related if the taxa they contain are more closely related.
• UniFrac (qualitative): Measures phylogenetic distance between sets of taxa in a tree (proportion of overall phylogenetic brach length shared between samples) (Lozupone et al, 2011, ISMEJ)
• Weighted UniFrac (quantitative): Variation of UniFrac that accounts for changes in relative abundance of lineages between communities
• Why do we care about UniFrac?
• Because abundance differences in closely related taxa may not have as bigger implication as diversity shifts in more divergent taxa (i.e. most metrics treat each taxa equally)
Visualizing diversity 2 – phylogenetic beta-diversity
• Determine if community differences are concentrated within particular lineages of the phylogenetic tree.
• Cluster environments to determine whether there are environmental factors (such as temperature or salinity, body location) that group communities together
• It is also very discriminating and makes pretty pictures
Human microbiome project consortium, 2012, Nature
Lozupone & Knight, 2007, PNAS
Caporaso et al, 2012, PNAS
Visualizing phylogenetic beta-diversity
beta_diversity.py -i otu_table_rarefied.biom –m weighted_unifrac –o beta_div –t rep_set.tre
• Takes your final O.T.U. table and your phylogenetic tree and makes a distance matrix based on UniFrac
• Can be imported into PRIMER e.t.c. or used in QIIME to make plots .e.g. 2D PcOA or 3D Emperor plots
beta_diversity_through_plots.py -i otu_table.biom -m my_mapping_file.txt -o wf_bdiv_even146/ -t rep_phylo.tre -e 146
• Will automate the process (note the –e rarefies so use unrarefied table) but sometimes plots e.t.c. can be dodgy depending on what python packages etc you have
Visualizing diversity 3 – phylogenetic beta-diversity
• A cool and useful way to visualize UniFrac is to use emporer PCoA plots:
• Generate the principle components from your distance matrix:
principal_coordinates.py -i beta_div.txt -o beta_div_coords.txt
• Make plot from coordinates and mapping file : make_emperor.py –i beta_div_coords.txt –m mappingfile.txt –o emperor
• Will make .html - open using Chrome browser: can colour by any of the factors in your mapping file - also export .png e.t.c.
Visualizing diversity 3 – emperor plots
core_diversity_analyses.py -o moving_pictures_tutorial-1.8.0/illumina/otus/cd258/ --
suppress_alpha_diversity -i moving_pictures_tutorial-1.8.0/illumina/otus/otu_table_mc2_w_tax_n
o_pynast_failures.biom -m moving_pictures_tutorial-1.8.0/illumina/combined_mapping_file.txt -t
moving_pictures_tutorial-1.8.0/illumina/otus/rep_set.tre -e 258 -c "SampleType,days_since_epoch"
Hands on – diversity analysis
O.T.U. networks
make_otu_network.py -i otu_table.biom -m map.txt -o otu_network
•Makes a network displaying what OTUs are shared between samples
•Input for cytoscape: http://qiime.org/tutorials/making_cytoscape_networks.html
•Network tutorial…..
Hands on –making a network
make_otu_network.py -i moving_pictures_tutorial-1.8.0/illumina/otus_denovo/otu_table.biom -o moving_pictures_tutorial-1.8.0/illumina/otus_denovo/otu_network -m moving_pictures_tutorial-1.8.0/illumina/combined_mapping_file.txt
Visualizing in cytoscape :
http://qiime.org/tutorials/making_cytoscape_networks.html
• That’s the core workflow…..QIIME has many other functions:
• http://qiime.org/1.8.0/scripts/
• Useful functions for manipulating sequence data (eg filtering, sorting, format changes)
• Stats…..
Software references: QIIME Caporaso et al 2010. QIIME allows analysis of high-throughput community sequencing data. Nature Methods 7(5): 335-336.
UCLUST Edgar RC. 2010. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26(19):2460-2461.
BLAST Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol 215(3):403-410.
GRENGENES McDonald et al 2012. An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J 6(3): 610–618.
RDP Classifier Wang Q, Garrity GM, Tiedje JM, Cole JR. 2007. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microb 73(16): 5261-5267.
PyNAST Caporaso JG et al 2010. PyNAST: a flexible tool for aligning sequences to a template alignment. Bioinformatics 26:266-267.
ChimeraSlayer Haas BJ, Gevers D, Earl AM, Feldgarden M, Ward DV, Giannoukos G, et al. 2011. Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Research 21:494-504.
MUSCLE Edgar, R.C. 2004 MUSCLE: multiple sequence alignment with high accuracy and high throughput Nucleic Acids Res:1792-1797
FasttTree Price MN, Dehal PS, Arkin AP. 2010. FastTree 2-Approximately Maximum-Likelihood Trees for Large Alignments. Plos One 5(3)
UNIFRAC Lozupone C, Knight R. 2005. UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol 71(12): 8228-8235.
Emperor Vazquez-Baeza Y, Pirrung M, Gonzalez A, Knight R. 2013. Emperor: A tool for visualizing high-throughput microbial community data. Gigascience 2(1):16.
-OMICs are not the -OMICs are not the answer…answer…
……unless you are asking the unless you are asking the right questionright question
Sampling design - Temporal
Sampling design - Spatial
Network Analysis Terminology
NODE – a variable
EDGE – a connection / interaction between 2 variables
MODULE – a defined set of nodes and edges
We can describe any interactions….
Network Analysis
Network Analysis
Network Analysis
Network Analysis
New trends in networks:
• Scale free networks – “hubs” are nodes with a high degree of connectivity e.g. google, keystone bacteria that strongly correlate with environmental variables
• Comparative network analysis – i.e. resilience and connectivity
• What is the minimum amount of information we need to predict microbial community dynamics? …remote sensing, models e.t.c.
•
Some Other Softwares to play with…
PRIMER-E http://www.primer-e.com/
Phylosifthttp://phylosift.wordpress.com/
GroopM http://minillinim.github.io/GroopM/
ARB Everything you needed to know from Ramon
R http://www.r-project.org/
Cytoscape http://www.cytoscape.org/
Acknowledgements
Ziggy Marzinelli – Experimental Design
Jason Woodhouse & Mark Brown – Network Analysis
Contact:
[email protected] (UNSW – Sydney)
[email protected] (NTU – Singapore)