lab 01 - uparse, phyloseq, and shiny-phyloseq 01 - upa… · 9/17/2014 lab 01 - uparse, phyloseq,...

51
9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 1/51 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq Paul J. McMurdie

Upload: others

Post on 02-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 1/51

Lab 01 - UPARSE, phyloseq,and Shiny-phyloseqPaul J. McMurdie

Page 2: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 2/51

UPARSE, phyloseq, and Shiny-phyloseq

2/51

Page 3: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 3/51

Motivation/Goals of lab

This is a short section of the lab (~15 minute) demonstrating howto run an alternative OTU clustering method from the defaultmethod in QIIME. Consider it a "fork" in the sequence processingworkflow, at one of the final steps toward counting the numberof DNA sequence fragments (reads) that came from eachbacterial species/OTU.

Install and run phyloseq/Shiny-phyloseq

UPARSE - Why we might want to use it

Sequence data - Inputs for UPARSE

check-trim-sequences.Rmd - some simple sequenceprocessing using R

run-uparse.sh - Open, Closed, De Novo OTU clustering

Bonus Try an alternate dataset. Yours or public.

·

·

·

·

·

·3/51

Page 4: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 4/51

Outline

Describe OTU-clustering

Running UPARSE (from usearch)

Importing into phyloseq/R

Heatmap of our clustering/import, Rmarkdown

Save .RData

Upload and run Shiny-phyloseq

·

·

·

·

·

·

4/51

Page 5: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 5/51

5/51

Page 6: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 6/51

Shiny-phyloseq is an interactive web application thatprovides a graphical user interface to the microbiomeanalysis package for R, called phyloseq.

McMurdie and Holmes (2014) Shiny-phyloseq: WebApplication for Interactive Microbiome Analysis withProvenance Tracking. Bioinformatics. In Press

Shiny-phyloseq

Page 7: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 7/51

Launching Shiny-phyloseq Local Session

Simply launching Shiny-phyloseq should also install missing/oldpackages. Make sure that you first have installed the latestversion of R.

The following R code will launch Shiny-phyloseq on mostsystems.

install.packages("shiny") shiny::runGitHub("shiny-phyloseq","joey711")

7/51

Page 8: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 8/51

Problems? Ask me, or see Installationpage:

Shiny-phyloseq installation instructions

8/51

Page 9: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 9/51

9/51

Page 10: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 10/51

OTU Clustering (background)

In SSU metagenomics, next-generation reads are clustered intoOperational Taxonomic Units (OTUs). This requires:

http://drive5.com/usearch/manual/otu_clustering.html

quality filtering

dereplication

discarding singletons (optional)

clustering into OTUs (typically at a 97% identity threshold)

·

·

·

·

10/51

Page 11: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 11/51

Side note: non-clustering methods

DADA denoiser:

Rosen, Callahan, Fisher, Holmes (2012) Denoising PCR-amplifiedmetagenome data BMC Bioinformatics 13:283

PDF here

11/51

Page 12: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 12/51

UPARSE Reference

Edgar, R. C. (2013). UPARSE: highly accurate OTU sequencesfrom microbial amplicon reads. Nature Methods, 10(10), 996–998.

http://www.drive5.com/usearch/manual/

UPARSE - Part of usearch software, low-level C

QIIME uses by default old version of usearch(licensing/distribution issues)

·

·

·

·

12/51

Page 13: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 13/51

UPARSE method (very brief)

Identify a set of OTU representative sequences that satisfy

All pairs of OTU sequences should have < 97% pair-wisesequence identity.

Chimeric sequences should be discarded.

All non-chimeric input sequences should match at least oneOTU with ≥ 97% identity.

·

·

·

13/51

Page 14: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 14/51

UPARSE results/comparison

Simulated microbiome DNA mixture from 21 bacterial species

Assess OTU-clustering accuracy of various workflows

QIIME produced thousands of OTUs, far more than thenumber of species

·

·

·

14/51

Page 15: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 15/51

UPARSE results/comparison

15/51

Page 16: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 16/51

UPARSE Requires Global-TrimmedReads

http://www.drive5.com/usearch/manual/global_trimming.html

16/51

Page 17: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 17/51

Run Trimming check-trim-sequences.Rmd

Execute check-trim-sequences.Rmd in its C.D.

Inspect the code (R), file paths, and threshold values.

Where does the sequence-trimming happen?

Why was that threshold chosen?

Would you have chosen the same threshold?

·

·

·

·

·

17/51

Page 18: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 18/51

Run run-uparse.sh

Use the provided bash script to execute UPARSE OTU clusteringmultiple times

Can you tell which is which? What is the code doing differently?

Open

Closed

De Novo

·

·

·

18/51

Page 19: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 19/51

"Introduction to phyloseq:importing and manipulatingdata"

Importing Data (QIIME/biom, QIIME-DB, UPARSE)

Simple Accessors

Simple Plotting

Shiny-phyloseq

·

·

·

·

Page 20: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 20/51

phyloseq Design

R packages allow modular design, flexible toolset.

20/51

Page 21: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 21/51

phyloseq Design

Data API Diagram

21/51

Page 22: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 22/51

Importing Data

Can always refer to the phyloseq import tutorial

Before we discuss the "convenience functions" for popular fileformats, it is important to emphasize that if you can get the datainto R, then you can get it "into" phyloseq. There really is nodifference. phyloseq is a specialized collection of R functions, andan R data definition(class). phyloseq requires a few extra steps sothat data can be recognized by its class (see previous diagram).

There are lots of ways to get data related to a microbiomeproject. Not all of these will come from a popular server orworkflow that is already supported by phyloseq.

22/51

Page 23: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 23/51

Importing Data - "Manually"

We encourage you to create and share your own import code forspecial data formats, as you come across them, or have a need tocreate them.

phyloseq provides tools for constructing phyloseq componentdata, and binding it together in the experiment-level multi-component data object, the phyloseq-class. These are the samefunctions used internally by the currently available importers. Inmost cases these also have the same name as the accessors. TheR interpreter knows what to do based on the argument (aprocess called "dispatch").

See the help files

?phyloseq?otu_table?sample_data?tax_table

23/51

Page 24: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 24/51

Importing Data - "Manually"

Constructors:

otu_table - Works on any numeric matrix. You must alsospecify if the species are rows or columns

sample_data - Works on any data.frame. The rownamesmust match the sample names in the otu_table if you plan tocombine them as a phyloseq-object

tax_table - Works on any character matrix. The rownamesmust match the OTU names (taxa_names) of the otu_table ifyou plan to combine it with a phyloseq-object.

phyloseq - Takes as argument an otu_table and anyunordered list of valid phyloseq components: sample_data,tax_table, phylo, or XStringSet. The tip labels of a phylo-object (tree) must match the OTU names of the otu_table,and similarly, the sequence names of an XStringSet objectmust match the OTU names of the otu_table.

·

·

·

·

24/51

Page 25: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 25/51

Importing Data - "Manually"

We'll create the example vanilla R tables using base R code. Nopackages required yet.

# Create a pretend OTU table that you read from a file, called otumatotumat = matrix(sample(1:100, 100, replace = TRUE), nrow = 10, ncol = 10)otumat

## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]## [1,] 91 27 10 19 61 72 1 58 50 62## [2,] 9 96 43 39 40 72 55 45 61 47## [3,] 17 57 89 18 64 7 6 22 3 87## [4,] 51 50 16 7 79 62 48 32 17 33## [5,] 95 25 48 92 1 63 97 86 40 90## [6,] 42 53 55 49 5 51 46 22 46 43## [7,] 54 1 36 19 78 32 19 14 42 79## [8,] 100 53 36 79 8 81 38 23 5 93## [9,] 69 73 78 72 66 10 82 21 43 95## [10,] 87 80 22 7 43 70 54 54 23 79 25/51

Page 26: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 26/51

Importing Data - "Manually"

Now we need a pretend taxonomy table

taxmat = matrix(sample(letters, 70, replace = TRUE), nrow = nrow(otumat), ncol rownames(taxmat) <- rownames(otumat)colnames(taxmat) <- c("Domain", "Phylum", "Class", "Order", "Family", "Genus",taxmat

## Domain Phylum Class Order Family Genus Species## OTU1 "c" "y" "f" "v" "i" "d" "k" ## OTU2 "p" "s" "p" "s" "z" "v" "h" ## OTU3 "x" "f" "r" "w" "y" "m" "h" ## OTU4 "b" "z" "n" "g" "c" "z" "z" ## OTU5 "w" "e" "y" "u" "o" "m" "f" ## OTU6 "d" "y" "p" "p" "t" "a" "v" ## OTU7 "f" "c" "t" "y" "f" "d" "j" ## OTU8 "x" "s" "h" "a" "m" "g" "n" ## OTU9 "l" "c" "g" "j" "t" "p" "e" ## OTU10 "m" "w" "e" "c" "d" "g" "i" 26/51

Page 27: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 27/51

Importing Data - "Manually"

Note how these are just vanilla R matrices. Now let's tellphyloseq how to combine them into a phyloseq object.

In the previous lines, we didn't even need to have phyloseqloaded yet. Now we do.

library("phyloseq"); packageVersion("phyloseq")

OTU = otu_table(otumat, taxa_are_rows = TRUE)TAX = tax_table(taxmat)OTU

## OTU Table: [10 taxa and 10 samples]## taxa are rows## Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Sample7 Sample8## OTU1 91 27 10 19 61 72 1 58## OTU2 9 96 43 39 40 72 55 4527/51

Page 28: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 28/51

Importing Data - "Manually"

Let's add to this, pretending we also had other types of dataavailable.

Create random sample data, and add that to the combineddataset. Make sure that the sample names match thesample_names of the otu_table.

sampledata = sample_data(data.frame( Location = sample(LETTERS[1:4], size=nsamples(physeq), replace=TRUE), Depth = sample(50:1000, size=nsamples(physeq), replace=TRUE), row.names=sample_names(physeq), stringsAsFactors=FALSE))sampledata

## Sample Data: [10 samples by 2 sample variables]:## Location Depth## Sample1 D 164

28/51

Page 29: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 29/51

Importing Data - "Manually"

Now create a random phylogenetic tree with the ape package,and add it to your dataset. Make sure its tip labels match yourOTU_table.

random_tree = ape::rtree(ntaxa(physeq), rooted=TRUE, tip.label=taxa_names(physeqplot(random_tree)

29/51

Page 30: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 30/51

Importing Data - "Manually"

Now let's combine these altogether. We can do this either byadding the new data components to the phyloseq object wealready have by using merge_phyloseq, or we can use a freshnew call to phyloseq to build it again from scratch. The resultsshould be identical, and we can check. You can always do eitherone with the help from accessor functions, and the choice isstylistic.

Merge new data with current phyloseq object:

physeq1 = merge_phyloseq(physeq, sampledata, random_tree)physeq1

## phyloseq-class experiment-level object## otu_table() OTU Table: [ 10 taxa and 10 samples ]## sample_data() Sample Data: [ 10 samples by 2 sample variables ]## tax_table() Taxonomy Table: [ 10 taxa by 7 taxonomic ranks ]30/51

Page 31: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 31/51

Importing Data - "Manually"

Rebuild phyloseq data from scratch using all the simulated datacomponents we just generated:

physeq2 = phyloseq(OTU, TAX, sampledata, random_tree)physeq2

## phyloseq-class experiment-level object## otu_table() OTU Table: [ 10 taxa and 10 samples ]## sample_data() Sample Data: [ 10 samples by 2 sample variables ]## tax_table() Taxonomy Table: [ 10 taxa by 7 taxonomic ranks ]## phy_tree() Phylogenetic Tree: [ 10 tips and 9 internal nodes ]

31/51

Page 32: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 32/51

Importing Data - "Manually"

Are they identical? We can test perfect identity with identicalfunction.

identical(physeq1, physeq2)

## [1] TRUE

32/51

Page 33: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 33/51

Importing Data - "Manually"

Let's build a couple tree plots with the new combined data.

plot_tree(physeq1, color="Location", label.tips="taxa_names", size = "abundance", justify = "left", ladderize="left", plot.margin=0.3)

33/51

Page 34: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 34/51

Importing Data - "Manually"

Now how about some heatmaps.

plot_heatmap(physeq1, taxa.label="Phylum")

34/51

Page 35: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 35/51

Importing Data - "Manually"

As you can see, you gain access to the all the typical phyloseqtools, but without relying on any of the import wrappers.

35/51

Page 36: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 36/51

Importing Data - BIOM-format

This is a format you are very likely to encounter.

The latest few versions of QIIME and other tools/workflows havebegun using this as a standard output format that can includevarious types of primary- and meta-data in one file, called biom-format.

Projects using the BIOM format:

QIIME

MG-RAST

PICRUSt

Mothur

MEGAN

VAMPS

·

·

·

·

·

· 36/51

Page 37: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 37/51

Importing Data - BIOM-format

The phyloseq package provides the import_biom function.

First, define the file paths. To make things easy, these are simplynames of files in the current directory. See ?import_biom fordetails about the parseFunction argument. In short, it is afunction (or NULL), that defines how taxonomy is processed.

biomfile = "rich_sparse_otu_table.biom"treefile = "biom-tree.phy"import_biom(biomfile, treefile, parseFunction=parse_taxonomy_greengenes)

## phyloseq-class experiment-level object## otu_table() OTU Table: [ 5 taxa and 6 samples ]## sample_data() Sample Data: [ 6 samples by 4 sample variables ]## tax_table() Taxonomy Table: [ 5 taxa by 7 taxonomic ranks ]## phy_tree() Phylogenetic Tree: [ 5 tips and 4 internal nodes ]

37/51

Page 38: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 38/51

Importing Data - BIOM-format

We can also import the biom-file that we created during ourQIIME example.

biomfile = "otu_table.biom"treefile = "rep_set.tre"ps0 = import_biom(biomfile, treefile, parseFunction=parse_taxonomy_greengenes)

## Warning: No greengenes prefixes were found. ## Consider using parse_taxonomy_default() instead if true for all OTUs. ## Dummy ranks may be included among taxonomic ranks now.## Warning: No greengenes prefixes were found. ## Consider using parse_taxonomy_default() instead if true for all OTUs. ## Dummy ranks may be included among taxonomic ranks now.## Warning: No greengenes prefixes were found. ## Consider using parse_taxonomy_default() instead if true for all OTUs. ## Dummy ranks may be included among taxonomic ranks now.## Warning: No greengenes prefixes were found. ## Consider using parse_taxonomy_default() instead if true for all OTUs. 38/51

Page 39: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 39/51

Importing Data - BIOM-format

Not many samples and ~400 OTUs: good for a tree plot.

plot_tree(ps0, color = "Treatment", ladderize = "left", justify = "left")

39/51

Page 40: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 40/51

Importing Data - UPARSE

You've already heard about OTU clustering with UPARSE. UPARSEis part of the larger usearch package, which defines its owncluster format (tab delimited).

In general, this file contains a row for every read, which means inpractice hundreds of millions of rows, or more.

phyloseq provides an extremely efficient import function to readthis file, the results of which are only an OTU table. You shouldthen use tools from the previous "manual import" section to bindtogether other related data for phyloseq.

40/51

Page 41: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 41/51

Importing Data - UPARSE

Recall our UPARSE example from Lab 01 Part 02. I've copiedthose output files into this section, the most important of whichis the cluster format (.uc) file.

Reference-only ("closed"") OTU table:

OTU = import_usearch_uc("closed_map.uc")

## Reading ucfile into memory and parsing into table ## Initially read 1312 entries. ## ... Now removing unassigned OTUs (* or NA)... ## Removed 529 entries that had no OTU assignment. ## A total of 783 will be assigned to the OTU table.

SD = import_qiime_sample_data("Fasting_Map.txt")tree = read_tree_greengenes("13_8_97_otus_unannotated.tree")closedps = phyloseq(OTU, SD, tree) 41/51

Page 42: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 42/51

Importing Data - UPARSE

Let's make the same tree, for comparison:

plot_tree(closedps, color = "Treatment", ladderize = "left", justify = "left")

42/51

Page 43: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 43/51

Importing Data - UPARSE

Question: "How could we import a "de novo" table from ourUPARSE run?"

43/51

Page 44: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 44/51

Importing Data - UPARSE

The following doesn't work exactly because we named the denovo sequences differently between the QIIME and UPARSE runs.If we matched these up though, we could compare.

Only 75 OTUs for de novo UPARSE, compared with 419 OTUs

OTU = import_usearch_uc("denovo_map.uc")

## Reading ucfile into memory and parsing into table ## Initially read 1312 entries. ## ... Now removing unassigned OTUs (* or NA)... ## Removed 412 entries that had no OTU assignment. ## A total of 900 will be assigned to the OTU table.

dim(OTU)

## [1] 9 75

44/51

Page 45: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 45/51

Importing Data - UPARSE

Now add taxonomy table from complete GreenGenes reference.This is a bit unweildy, and I should probably add animport_greengenes_taxonomy function for just this purpose.

library("data.table")taxDT = fread("13_8_97_otu_taxonomy.txt", sep="\t", header=FALSE, colClasses="character") setnames(taxDT, c("OTU", "taxstring"))setkey(taxDT, "OTU")closedtaxDT = taxDT[taxa_names(closedps), ]closedtaxlist = lapply( lapply(closedtaxDT$taxstring, function(x){strsplit(x, split="; ", fixed=TRUE)[[1]]} ), parse_taxonomy_greengenes)names(closedtaxlist) <- closedtaxDT$OTUclosedtax = build_tax_table(closedtaxlist)

45/51

Page 46: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 46/51

Importing Data - UPARSE

Now merge the previous phyloseq object, and the newtaxonomyTable into one new phyloseq object.

closedps <- merge_phyloseq(closedps, closedtax)closedps

## phyloseq-class experiment-level object## otu_table() OTU Table: [ 73 taxa and 9 samples ]## sample_data() Sample Data: [ 9 samples by 6 sample variables ]## tax_table() Taxonomy Table: [ 73 taxa by 7 taxonomic ranks ]## phy_tree() Phylogenetic Tree: [ 73 tips and 72 internal nodes ]

46/51

Page 47: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 47/51

Importing Data - UPARSE

Repeat that tree, add phylum label.

plot_tree(closedps, color = "Treatment", label.tips = "Phylum", ladderize = "left", justify = "left")

47/51

Page 48: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 48/51

Importing Data - QIIME-legacy

QIIME (legacy format)

Older versions of QIIME produced several files that are stillsupported input files for phyloseq.

OTU table file. Essentially tab-delimited file with OTU-abundance and taxonomic identity information.

Map file stores sample covariates and demultiplexinginformation, like primers

Tree (Newick format) with a tip for each OTU, which can alsobe imported by this function.

Reference sequences (fasta format). Though rarely analyzedlater, phyloseq can import these as well.

·

·

·

·

48/51

Page 49: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 49/51

Importing Data - QIIME-legacy

Examples of these files are included within phyloseq.

otufile = system.file("extdata", "GP_otu_table_rand_short.txt.gz", package="phyloseq"mapfile = system.file("extdata", "master_map.txt", package="phyloseq")trefile = system.file("extdata", "GP_tree_rand_short.newick.gz", package="phyloseq"rs_file = system.file("extdata", "qiime500-refseq.fasta", package="phyloseq")qiimedata = import_qiime(otufile, mapfile, trefile, rs_file)

## Processing map file...## Processing otu/tax file...## Reading file into memory prior to parsing...## Detecting first header line...## Header is on line 2 ## Converting input file to a table...## Defining OTU table... ## Parsing taxonomy table...## Processing phylogenetic tree...## /Library/Frameworks/R.framework/Versions/3.1/Resources/library/phyloseq/extdata/GP_tree_rand_short.newick.gz ...49/51

Page 50: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 50/51

Importing Data - QIIME-DB

QIIME-DB is a server of publicly available datasets that includesthe OTU table, taxonomy, and sample covariate ("mapping")table.

See the microbio_me_qiime tutorial

microbio_me_qiime(1457, ext = ".tgz")

50/51

Page 51: Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq 01 - UPA… · 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq. 9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

9/17/2014 Lab 01 - UPARSE, phyloseq, and Shiny-phyloseq

file:///home/mattias/Downloads/mcmurdie/Lab-1/Lab-01.html#25 51/51

Save imported data for later

We don't have to re-run data-importing tasks every time we comeback to our data.

Instead we can save it in an R binary format, usually with thesuffix .RData.

We can save every object in our current working environmentusing save.image, or we can select specific objects that we wantto save.

save(ps0, closedps, qiimedata, file = "example-data.RData")

51/51