multi-platform 'omics analysis of human ebola virus ... filecell host & microbe, volume 22...

17
Cell Host & Microbe, Volume 22 Supplemental Information Multi-platform 'Omics Analysis of Human Ebola Virus Disease Pathogenesis Amie J. Eisfeld, Peter J. Halfmann, Jason P. Wendler, Jennifer E. Kyle, Kristin E. Burnum- Johnson, Zuleyma Peralta, Tadashi Maemura, Kevin B. Walters, Tokiko Watanabe, Satoshi Fukuyama, Makoto Yamashita, Jon M. Jacobs, Young-Mo Kim, Cameron P. Casey, Kelly G. Stratton, Bobbie-Jo M. Webb-Robertson, Marina A. Gritsenko, Matthew E. Monroe, Karl K. Weitz, Anil K. Shukla, Mingyuan Tian, Gabriele Neumann, Jennifer L. Reed, Harm van Bakel, Thomas O. Metz, Richard D. Smith, Katrina M. Waters, Alhaji N'jai, Foday Sahr, and Yoshihiro Kawaoka

Upload: vuongkhue

Post on 14-Aug-2019

213 views

Category:

Documents


0 download

TRANSCRIPT

Cell Host & Microbe, Volume 22

Supplemental Information

Multi-platform 'Omics Analysis of Human

Ebola Virus Disease Pathogenesis

Amie J. Eisfeld, Peter J. Halfmann, Jason P. Wendler, Jennifer E. Kyle, Kristin E. Burnum-Johnson, Zuleyma Peralta, Tadashi Maemura, Kevin B. Walters, TokikoWatanabe, SatoshiFukuyama,Makoto Yamashita, JonM. Jacobs, Young-MoKim, Cameron P. Casey, Kelly G.Stratton, Bobbie-Jo M. Webb-Robertson, Marina A. Gritsenko, Matthew E. Monroe, KarlK. Weitz, Anil K. Shukla, Mingyuan Tian, Gabriele Neumann, Jennifer L. Reed, Harm vanBakel, Thomas O. Metz, Richard D. Smith, Katrina M. Waters, Alhaji N'jai, FodaySahr, and Yoshihiro Kawaoka

Figure S1. Related to Figure 1. Multi-platform ‘omics analysis workflow. The figure shows the workflow used to process and/or extract/inactivate materials for ELISAs, viral load quantification, and ‘omics analyses. It also indicates the type of material shipped to the U.S. and the analytical procedure applied to each material type.

Figure S2. Related to Figure 2. Ebola transcripts in PBMC and proteins in plasma, and host response overview. (A) The graph shows the total number of EBOV reads (combining positive- and negative-sense viral genomic RNA) per million total sequenced reads in each sample. Variation is represented as standard deviation. Low-level EBOV counts in four control samples are most likely due to barcode bleed-through. (B) The graphs show the log2 normalized raw protein expression values of EBOV glycoprotein (GP), VP40, and nucleoprotein (NP) in all sample groups. Variation is represented by standard deviation. For panels (A) and (B): F, fatalities; S1, survivor sample 1; S2, survivor sample 2; S3, survivor sample 3; H, healthy controls. (C) For each ‘omics platform, the total number of detected molecules is indicated in the upper left of each panel. Each panel also shows the total number of molecules exhibiting significantly altered levels (P < 0.01), the number of molecules exhibiting increased and decreased expression among those that were significantly altered, and the maximum (‘Max’) and minimum (‘Min’) expression changes observed for significantly altered transcripts for the following comparisons: EVD fatalities (F) vs. healthy controls (H), EVD survivors’ samples vs. healthy controls (S1, S2, or S3 vs. H), and S1 vs. F. For the S1 vs. F comparison, maximum and minimum values are represented as the direction of expression in EVD survivors. For lipidomics analyses, positive and negative ionization results are shown separately. Colored bars in the ‘Up-regulated’ and ‘Down-regulated’ columns depict the relative number of molecules in each condition.

Figure S3. Related to Figure 5 and Figure 6. Overview of MEGENA, and MEGENA module enrichment summary. (A) Overview of MEGENA: (i) The MEGENA algorithm was applied to all transcripts identified in PBMCs regardless of expression levels in different conditions. (ii) MEGENA analysis output includes modules of co-expressed genes, which may be further divided into sub-modules (i.e., a network of modules). The module hierarchy of the EVD PBMC co-expression network is shown in this panel, with individual modules represented by black nodes and co-expression relationships between modules represented by black edges connecting the nodes. The entire co-expression network is represented by the central node. (iii) Sub-modules within a larger module – each with uniquely correlated co-expression behavior – are referred to as ‘child’ modules, and the larger module is referred to as the ‘parent’ module. In this panel, first-level child nodes of the entire co-expression network are located in the first concentric ring outside of the center (indicated by the red circle). Child sub-networks for level 1 nodes are located in the second concentric ring (level 2, blue circle), and additional levels of child sub-networks are indicated by green, orange, and gray circles. (iv) An example of a module sub-network (for Module 133 [level 2], indicated by the purple circle in panel (iii)) is shown. Individual Module 133 transcripts are represented by colored nodes and expression relationships between transcripts are represented by edges connecting the nodes. Level 3 child modules within Module 133 are differentiated by shades of red, orange, and yellow. (v) Following creation of the MEGENA co-expression network, module overlaps were calculated for increased and decreased transcripts identified by various patient group comparisons. An example of the level of overlap between increased transcripts in F vs. H, S1 vs. H, or S2 vs. H comparisons and EVD PBMC Module 2 is shown. (vi) Subsequent to identification of module overlaps, module functions may be explored by examining expression dynamics and/or module enrichment. For additional information, refer to Table S5. (B) Graphs show the level of enrichment of each MEGENA parent module for significantly altered transcripts (q-value < 0.01) in the EVD fatalities vs. healthy controls (H) (top), EVD survivors’ samples 1 vs. H (middle) and EVD survivors’ samples 2 vs. H (bottom) comparisons. Module enrichment values are represented as the negative log10 adjusted P-value. Module enrichment was calculated separately for transcripts exhibiting increased (red) or decreased (blue) expression.

Figure S4. Related to Figure 5. Module 2 transcript expression. This figure shows data associated with MEGENA Module 2, derived from PBMC transcriptome data. (A) The panel depicts heat maps of average PBMC expression levels and associated q-values for all transcripts exhibiting significantly altered expression (q-value < 0.01) in at least one condition when comparing EVD patients (fatalities, ‘F’; survivors’ first, second, and third samples, ‘S1’, ‘S2’, and ‘S3’) to healthy controls (H). For the transcriptome heat map, values are displayed as the direction of expression in the EVD patient. Columns show expression and q-values for individual transcripts (IDs not shown) and rows represent different comparison groups. (B) The panel depicts transcript level and q-value heat maps for mitotic regulators that are known to be transcriptionally associated with monocyte-to-macrophage differentiation. For the transcriptome heat maps, values are displayed as the direction of expression in the EVD patient. Columns represent different comparison groups and rows show expression and q-values for individual transcripts (represented as Entrez Gene Official Symbols). FC, fold-change.

Figure S5. Related to Figure 6. MEGENA Modules 18 and 27. (A) Module 18 and 27 networks generated by using MEGENA are shown for the EVD survivors’ sample 1 (S1) or EVD fatalities (F) vs. healthy controls (H) comparisons. Nodes represent individual transcripts and edges represent the expression relationship between transcripts. Nodes representing non-significant transcripts are shown in gray, and for significantly changed transcripts (q < 0.01), nodes are colored according to the log2 fold-change (FC). (B) The panel depicts heat maps of average PBMC transcript expression levels and associated q-values for all Module 27 transcripts that were significantly altered (q < 0.01) in at least one condition when comparing EVD patients (fatalities, ‘F’; survivors’ first, second, and third samples, ‘S1’, ‘S2’, and ‘S3’) to healthy controls (H). For the expression heat map, values are displayed as the direction of expression in the EVD patient. Columns show expression and q-values for individual transcripts (IDs not shown) and rows represent different comparison groups. FC, fold change. (C) The graph depicts average expression levels for individual Module 27 transcripts in EVD vs. healthy control comparisons, with q-values indicated by the colored dots at the top of each bar.

Figure S6. Related to Figure 5 and Figure 6. Comparison with other transcriptomics datasets. (A) The heat map shows transcript expression as log2 effect size (i.e., the fold-change standardized by the standard deviation) for the PD-1 signaling pathway (see Table S5), which was enriched in EVD fatalities from this study and in patients with septic shock (GSE48080). Values are displayed as the direction of expression in the EVD (fatalities, ‘F’; survivors’ first, second, and third samples, ‘S1’, ‘S2’, and ‘S3’) or sepsis patients compared to healthy controls (H). Columns show expression for individual transcripts (IDs shown at the bottom), rows represent different comparison groups, and the pathway enrichment score (-log10 P-value) for each comparison is shown at the right of the panel. (B) Heat maps show transcript expression for ‘Interferon α/β Signaling’ (top) and ‘TCA Cycle and Respiratory Electron Transport’ (bottom), which were among the enriched pathways in EVD patients from this study or a single EVD patient that was treated in the U.S. (GDS4356) (see Table S5). The heat maps are represented as described for panel (A), except that specific comparisons are indicated by colored lines at the left of the heat map.

Figure S7. Related to Figure 7. Biomarker discovery pipeline. (A) This panel shows the nested cross-validation pipeline for biomarker discovery. Patient samples were filtered by various maximum times from onset (8 – 11 days) and assigned to training and test sets. Training sets included samples with known survival outcomes (‘Real Data’) and samples for which outcome was randomly assigned (‘Bootstrap Permutation’). Repeated measures were removed from each training/test set. Leave-one-out cross validation (LOOCV) was then used on the training set to estimate model complexity on each dataset separately and combined. Least absolute shrinkage and selection operator (LASSO)-constrained logistic regression parameters were estimated using survival as the binary outcome. Receiver operator characteristics (ROC) were assessed using hold-outs from the first level of cross validation (CV). PCA, principal components analysis; PPCA, probabilistic principal components analysis. (B) This panel shows biomarker ROC curves. Logistic regression models resulting from the LASSO constrained, nested LOOCV pipeline were used to predict survival outcomes of hold-out patients and then to derive ROC curves. This was done separately for each of 8 datasets (displayed in separate panels). Colored ROC curves represent real data with maximum time from onset filters set for the training sets (blue, green, red, and black, respectively, for days 8–11; note that line patterns also differentiate each of the day filters). Gray curves display performance under identical conditions after randomly permuting patient identifiers. The range of areas under the colored ROC curves (AUC) are listed under the dataset title of each panel. ‘lipid_NEG’, lipids identified by negative ionization LC-MS/MS; ‘lipid_POS’.

Table S6. Related to Figure 7. Samples used for biomarker prediction pipeline.

Samples1 Excluded from Training Sets Due to Missing Data:

Samples1 Used for

Training and Test Sets:

Clinical Cytokines Proteins Transcripts, Lipids and Metabolites

All Platforms

UW001 UW004 UW001 UW001 None UW001

UW002 UW005 UW019 UW004 UW004

UW003 UW006 UW036 UW036 UW005

UW004 UW036 UW037 UW061 UW006

UW005 UW081 UW056 UW019

UW006 UW082 UW036

UW013 UW037

UW019 UW056

UW022 UW061

UW025 UW081

UW028 UW082

UW031

UW034

UW035

UW036

UW037

UW039

UW040

UW042

UW043

UW056

UW057

UW061

UW062

UW081

UW082 1Additional sample information can be found in Table S1.

Table S7. Related to STAR Methods. Lipidomics LC-MS elution gradient.

Time (min) % MPA1 % MPB2

0.5 90 10

2 70 30

10 60 40

20 45 55

40 40 60

70 0.5 99.5

90 0.5 99.5

92 93 7

142 93 7 1MPA, mobile phase A 2MPB, mobile phase B