principal component analysis for bacterial proteomic analysis ii

Principal component analysis for bacterial proteomic analysis II

Y-h. Taguchi Chuo University

Akira Okamoto,Nagoya University

1. Introduction

2. Incubation condition of Streptococcus pyogenes & retrieval for proteomics data

3. PCA analysis of proteome

4. Biological meanings of obtained proteins

5. Summary & Conclusion

1. IntroductionStreptococcus pyogenes (化膿レンサ球菌) is ...

normal bacteria flora, but also can causelife-threatening diseases.

Thus, it is important to know what the triggers for S. pyogenes to cause such dangerous diseases are.

→ In this study, we employ proteomic analysis of S. pyogenes during growing phase under two distinct conditions.

2. Incubation condition of Streptococcus pyogenes & retrieval for proteomics data

37 , until or 4, 6, 14 and 20 hours℃(OD660 = 0.40, 0.83, 0.92, and 0.90)

Under two distinct conditions

1) shaking (sha): more oxidize stress

2) static (sta): ordinary condition

Fraction

Cell (wc) and Supernatant (snt) [using centrifuge]

Three biological replicates each

Retrieval of proteomic data

mass spectrometry detection of fragmented proteins[by LTQ-Orbitrap XL + LC]

Protein identification by MASCOT Software

%emPAI (normalized amount of proteins) are used for further analysis.

3. PCA analysis of proteome

Feature (protein) selection methods are similar to yesterday's presentation, “Refined blood-borne miRNome of human diseases via PCA-based feature extraction”, thus will be skipped.

The reason why we employ this method differs from the reason why we employed this method yesterday (“selection of miRNA biomarker independent of selection of training/test sets”)

To answer the question “What is significantly expressed during these processes?” without any prejudgements.

We would like to list “any” significant features in this experiment, e.g.,

Temporal significance

sta:wc sha:wcsta:snt sha:snt

Incubation condition significance


time

time

Fraction significance


timeOr their combinations.....


timeUnsupervised methods like PCA is useful. (Clustering may be OK, too. but it forces hierarchical or prejudged number of clusters.)

ResultsPCA embeddings of samples

⇒ three clear clusters ⇒ What are representative proteins?

sha05_wcsha05_sntsha07_wcsha07_sntsha14_wcsha14_sntsha20_wcsha20_snt

sta04_wcsta04_sntsta06_wcsta06_sntsta14_wcsta14_sntsta20_wcsta20_snt

UPGMA ⇒☓

sha05_snt1 sha05_snt2 sha05_snt3 sha07_snt1 sha07_snt2 sha07_snt3 sha20_wc1 sha20_wc2 sta04_snt1 sta04_snt2 sta04_snt3

sha05_wc1 sha05_wc2 sha05_wc3 sha07_wc1 sha07_wc2 sha07_wc3 sha14_wc1 sha14_wc2 sha14_wc3 sha20_wc3 sta04_wc1 sta04_wc2 sta04_wc3 sta06_wc1 sta06_wc2 sta06_wc3 sta14_wc1 sta14_wc2 sta14_wc3 sta20_wc1 sta20_wc2 sta20_wc3

sha14_snt1 sha14_snt2 sha14_snt3 sha20_snt1 sha20_snt2 sha20_snt3 sta06_snt1 sta06_snt2 sta06_snt3 sta14_snt1 sta14_snt2 sta14_snt3 sta20_snt1 sta20_snt2 sta20_snt3

K-means ⇒△(assuming three clusters)

Mclust：Model-Based Clustering The optimal model according to BIC for EM initialized by　hierarchical clustering for parameterized Gaussian mixture models.

best model: diagonal, equal shape with 9 components 警告メッセージ：1: In hcEII(data = data) : # of observations <= data dimension2: In summary.mclustBIC(Bic, data, G = G, modelNames = modelNames) : best model occurs at the min or max # of components considered3: In Mclust(t(XX)) : optimal number of clusters occurs at max choice

Does not work without feature selection.....Useless for feature selection … orz

PCA embeddings of proteins

SPy1489:hlpA SPy2039:speB Spy1073:rplLSPy2005 SPy2018:emm1Spy0059:rpmCSpy0611:tufASpy0274:plrSpy0062:rplXSPy2043:mfSpy0613:tpiSpy2079:AhpCSPy1831:rpsF}Spy2160:rpmGSPy1373:ptsHSPy0731:eno Spy1371:gapNSpy1881:pgkSPy0711:speCSpy0071:rpmDSPy2070:groEL Spy0019SPy0712:mf2

23 proteins selcted(underlined are ribosomal ptoteins)

PCA embeddings of samples with only selected 23 proteins ⇒ Configuration is conserved ⇒ These 23 proteins are critical for this configuration

We have repeated same procedure again after removing 23 proteins, and additional 30 proteins are selected

4. Biological meanings of obtained proteinsPeroxiredoxin reductase (SPy2079:AhpC), which is estimated to be involved in oxygen metabolism and hydrogen peroxide decomposition, is found in shaking culture condition rather than static condition. It seems reasonable that the increasing amount of AhpC in shaking condition because the shaking condition induces the higher oxygen stress.

shaking condition

This is just an example. Almost all selected proteins are biologically reasonable.

5. Summary & Conclusion

・Feature (protein) extraction based upon PCA extracted biologically important proteins.

・ PCA based, unsupervised feature extraction method is applied to proteomics data.

・Proteomics analysis is applied to growth phases of S. pyogenes

・At the moment, we have not yet figure out the trigger of disease but more extensive researches will enable us to understand it.

Collaborator wanted!

principal component analysis for bacterial proteomic analysis ii

Documents