statistical analysis of metabolomics data 1 statistical analysis of metabolomics data xiuxia du...
Post on 01-May-2018
227 Views
Preview:
TRANSCRIPT
01/27/2014
1
Statistical Analysis of Metabolomics Data
Xiuxia Du
Department of Bioinformatics & Genomics
University of North Carolina at Charlotte
Outline
2
• Introduction
• Data pre-treatment1. Normalization
2. Centering, scaling, transformation
• Univariate analysis1. Student’s t-tes
2. Volcano plot
• Multivariate analysis1. PCA
2. PLS-DA
• Machine learning
• Software packages
01/27/2014
2
Results from data processing
3
Next, analysis of the quantitative metabolite information …
Metabolomics data analysis
4
• Goals– biomarker discovery by identifying significant features associated
with certain conditions
– Disease diagnosis via classification
• Challenges– Limited sample size
– Many metabolites / variables
• Workflow
pre-treatmentunivariateanalysis
multivariate analysis
machine learning
01/27/2014
3
Data Pre-treatment
5
Normalization (I)
6
• Goals– to reduce systematic variation
– to separate biological variation from variations introduced in the experimental process
– to improve the performance of downstream statistical analysis
• Sources of experimental variation– sample inhomogeneity
– differences in sample preparation
– ion suppression
01/27/2014
4
Normalization (II)
7
• Approaches– Sample-wise normalization: to make samples comparable to each
other
– Feature/variable-wise normalization: to make features more comparable in magnitude to each other.
• Sample-wise normalization– normalize to a constant sum
– normalize to a reference sample
– normalize to a reference feature (an internal standard)
– sample-specific normalization (dry weight or tissue volume)
• Feature-wise normalization (i.e., centering, scaling, and transformation)
Centering, scaling, transformation
8
01/27/2014
5
Centering
• Converts all the concentrations to fluctuations around zero instead of around the mean of the metabolite concentrations
• Focuses on the fluctuating part of the data
• Is applied in combination with data scaling and transformation
9
Scaling
• Divide each variable by a factor
• Different variables have a different scaling factor
• Aim to adjust for the differences in fold differences between the different metabolites.
• Results in the inflation of small values
• Two subclasses– Uses a measure of the data dispersion
– Uses a size measure
10
01/27/2014
6
Scaling: subclass 1
• Use data dispersion as a scaling factor– auto: use the standard deviation as the scaling factor. All the
metabolites have a standard deviation of one and therefore the data is analyzed on the basis of correlations instead of covariance.
– pareto: use the square root of the standard deviation as the scaling factor. Large fold changes are decreased more than small fold changes and thus large fold changes are less dominant compared to clean data.
– vast: use standard deviation and the coefficient of variation as scaling factors. This results in a higher importance for metabolites with a small relative sd.
– range: use (max-min) as scaling factors. Sensitive to outliers.
11
Scaling: subclass 2
• Use average as scaling factors– The resulting values are changes in percentages compared to the
mean concentration.
– The median can be used as a more robust alternative.
12
01/27/2014
7
Transformation
• Log and power transformation
• Both reduce large values relatively more than the small values.
• Log transformation– pros: removal of heteroscedasticity
– cons: unable to deal with the value zero.
• Power transformation– pros: similar to log transformation
– cons: not able to make multiplicative effects additive
13
Centering, scaling, transformation
A. original dataB. centeringC. autoD. paretoE. rangeF. vastG. levelH. logI. power
14
01/27/2014
8
Log transformation, again
• Hard to do useful statistical tests with a skewed distribution.
• A skewed distribution or exponentially decaying distribution can be transformed into a Gaussian distribution by applying a log transformation.
http://www.medcalc.org/manual/transforming_data_to_normality.php15
Univariate vs. multivariate analysis
• Univariate analysis examines each variable separately.– t-tests
– volcano plot
• Multivariate analysis considers two or more variables simultaneously and takes into account relationships between variables.– PCA: Principle Component Analysis
– PLS-DA: Partial Least Squares-Discriminant Analysis
• Univariate analyses are often first used to obtain an overview or rough ranking of potentially important features before applying more sophisticated multivariate analyses.
16
01/27/2014
9
Univariate Statisticst-test
volcano plot
17
Univariate statistics
• A basic way of presenting univariate data is to create a frequency distribution of the individual cases.
Due to the Central Limit Theorem, many of these frequency distributions can be modeled as a normal/Gaussian distribution.
18
01/27/2014
10
Gaussian distribution
φ μ,σ
2(
0.8
0.6
0.4
0.2
0.0
1
x
1.0
0 2 4
x)
0,μ=0,μ=0,μ=μ=
2 0.2,σ =2 1.0,σ =2 σ =2 σ =
• The total area underneath each density curve is equal to 1.
(x) 1
2e x
2
mean
variance 2
standard deviation
0.0
0.1
0.2
0.3
0.4
−2σ −1σ 1σ−3σ 3σµ 2σ
34.1% 34.1%
13.6%2.1%
13.6% 0.1%0.1%2.1%
https://en.wikipedia.org/wiki/Normal_distribution19
Sample statistics
https://en.wikipedia.org/wiki/Normal_distribution
Sample mean: X=1
nXi
i1
n
Sample variance: S2 1
n1Xi X 2
i1
n
Sample standard deviation: S S2
20
01/27/2014
11
t-test (I)
Null hypothesis H0: 0
Alternative hypethesis H1: <0
Test statistic: t x 0
s n
Sample standard deviation: s 1
n1xi x 2
i1
n
The test statistic t follows a student’s t distribution. The distribution has n-1degrees of freedom.
• One-sample t-test: is the sample drawn from a known population?
21
t-test (II): p-value
When the null hypothesis is rejected, the result is said to be statistically significant.
22
01/27/2014
12
t-test (III)
• Two-sample t-test: are the two populations different?
• The two samples should be
independent.
Null hypothesis H0: 1 2 0
Alternative hypethesis H1: 1 2 0
Test statistic: t x1 x2 12
s12
n1
s22
n2
23
t-test (IV)
Equivalent statements:
• The p-value is small.
• The difference between the two populations is unlikely to have occurred by chance, i.e. is statistically significant.
24
01/27/2014
13
t-test (V)
• The p-value is big.• The difference between the
two populations are said NOT to be statistically significant.
25
t-test (VI)
• Paired t-test: what is the effect of a treatment?
• Measurements made on the same individuals before and after the treatment.
Example: Subjects participated in a study on the effectiveness of a certain diet on serum cholesterol levels.
Subject Before After Difference
1 201 200 -1
2 231 236 +5
3 221 216 -5
5 260 243 -17
6 228 224 -4
7 245 235 -10
H0 :d 0
Ha :d 0
Test statistic: t d dsd n
26
01/27/2014
14
Volcano plot (I)
• Plot fold change vs. significance
• y-axis: negative log of the p-value
• x-axis: log of the fold change so that changes in both directions (up and down) appear equidistant from the center
• Two regions of interest: those points that are found towards the top of the plot that are far to either the left- or the right-hand side.
27
Volcano plot (II)
28
01/27/2014
15
Multivariate statisticsPCA
PLS-DA
29
PCA (I)
• PCA is a statistical procedure to transform a set of correlated variables into a set of linearly uncorrelated variables.
30
01/27/2014
16
PCA (II)
• The uncorrelated variables are ordered in such a way that the first one accounts for as much of the variability in the data as possible and each succeeding one has the highest variance possible in the remaining variables.
• These ordered uncorrelated variables are called principle components.
• By discarding low-variance variables, PCA helps us reduce data dimension and visualize the data.
31
PCA (III)
32
• The transformation matrix
• The transformation
• are called the 1st, 2nd, and nth principle components, respectively.p1, p2,, pn
01/27/2014
17
PCA (IV)
33
• Each original sample is represented by an n-dimensional vector:
• After the transformation– If all of the principle components are kept, then each sample is still
represented by an n-dimensional vector:
– If only m < n principle components are kept, then each sample will be represented by an m-dimensional vector:
Sbefore transformation x1, x2,, xn
Safter transformation y1, y2,, yn
Safter transformation y1, y2,, ym
PCA (V)
34
• y are called scores.
• For visualization purpose, m is usually chosen to be 2 or 3.
• As a result, each sample will be represented by a 2- or 3-dimenational point in the score plot.
●
●
●
● ●
●
●
●
●
●
●
●
• 30 • 20 • 10 0 10
•10
010
2030
Principal Component Analysis (Scores)
PC1
PC
2
wt15
wt16
wt18
wt19 wt21
wt22
ko15
ko16
ko18
ko19
ko21ko22
01/27/2014
18
PCA (VI)
35
• Loadings
• are denoted as points in the loadings plotp11, p12 , p21, p22 ,, pn1, pn2
PCA (VII)
36• 0.06 • 0.04 • 0.02 0.00 0.02 0.04
•0.0
6•0
.04
•0.
020.
000.
020.
040.
06
Principal Component Analysis (Loadings)
PC1
PC
2
200.1/2924
200.9/3473202.1/3497
204.9/2590
205/2788 206/2789
207/2744
207.1/2717
213.2/3497
214.1/2519
215.2/3478
215.9/3506
215.9/3170
218.2/3366
219.1/2523
225/3232
225.1/3265
225.1/3151
225.1/3668
226/2962
229.1/3424
231.1/2849
231.1/2528
233/3026
233.1/3025
235/3130
235.1/2716
236.1/2522
236.9/3632
239.1/3252
241.1/3646
241.2/3641
243.1/3542 243.2/3651
244/4103
244.1/2830
244.9/3130
245.1/2882
245.9/3132
246.4/3287
246.7/3295
246.8/3281
247/3490
247.1/2648
247.9/2979
248.2/3240
249.1/3631250.1/3639
250.1/3674
250.2/3627
250.2/4059
251.1/3880
251.2/3882
254.1/3226255.1/3668
256.1/3444256.1/3467
256.1/3652
256.2/3454
256.2/3636256.2/3662
257.1/3660
257.1/3640
257.2/3647
257.9/3425
258/3497
258/3232
259.1/2802
261.1/3148
264.2/3158
265.1/2620265.2/3659
265.2/4114 266.1/3624
266.2/3634
267.1/2783
267.2/3628268.1/3868
268.2/3888
269.1/3886
269.2/3888
269.2/4129
270.1/3891
270.2/3880
270.5/3166
271.1/3879
271.2/3404
271.3/3264
272.1/3123
272.1/2727
273.1/3425
273.1/3655
273.2/3424
273.9/2640
274.2/3431
277/2638
277.2/3845
279/2788
279/2755
280/2788
280.1/3315
280.2/3319
281/2793
281.1/2788
281.2/3672
282.1/2616
282.1/3468
282.2/3471
283.1/3636
283.2/3883
283.2/3640
283.2/3675 283.2/3134
284.1/2716
284.1/3634
284.2/3884284.2/3627
284.2/3675
285.1/3854
285.1/4129285.1/2716
285.2/3861
285.2/4127
285.2/3637
286/2620
286.1/3263
286.1/3283
286.1/3851
286.2/3256
286.2/2855
287/2647
287.1/3256
287.1/2635
287.2/3262
288.1/2794
288.2/2795
288.2/2922
289/2952
289.1/3118
289.1/2718
289.2/3125
291.1/3629
292.1/3492
292.2/3217
292.2/3459
294.1/2577
296.1/2727
296.9/4127
297.1/3848
297.2/3854
297.3/3381
298.1/3182
298.1/3671
298.2/3187
299.2/4126
300.2/3390
301/2787
301.1/3891
301.1/3384
301.1/3681
301.2/3391
301.2/3886
301.2/3656
302/2621
302/2789
302.1/3253
302.1/3395
302.1/2818
302.2/3389
302.3/3901
303/2622
303.1/3238
303.2/3388
304.1/2924304.1/3236
304.2/3471
305.1/3506
305.1/2926
305.1/2877
305.2/3290
306.1/3506306.1/2929
306.9/2617
307/2621
307.1/3141
307.1/3503
307.1/3864
308/2623
308.1/3257
308.2/3261
309.1/2923
309.1/3649
309.2/3633
310.1/3463
310.1/2923
310.1/2798
310.2/3635310.2/3468
311.1/4131
311.1/3080
311.1/3625
311.2/3875
312.1/3000
312.2/3877312.2/4134
313/2786
313.1/4137
313.1/3290
313.1/3506313.1/2824
313.2/3290
313.2/4137
314/2786
314.1/3289
314.1/4141
314.1/2767
314.1/3857
314.2/3289
315/2509
315/2782
315/2536
315/3270
315.1/2773
315.1/3653
316.1/3087
316.1/3653
316.2/3089
317/2788
317.2/3492
317.2/3092
317.2/3844
317.3/3942
318.1/3246
319.2/3245
321.1/3641
321.1/3673
322.1/3507
322.1/3394
322.1/2899
322.2/3504
322.2/3635322.2/3149
323.1/3503
323.1/3391
323.1/3896
323.2/3511323.2/3496
323.2/3392
323.2/3883
324/3268
324.1/2789
324.1/3509
324.2/3507
324.2/3277
325/3255
325.2/3649
325.2/3238
326.1/3300
326.1/3090
326.2/3420
326.4/4114
327.1/3505
327.1/2927
327.2/3420
327.2/3851
328.1/3508
328.2/3631
328.2/3607
328.2/3376
328.2/3500
329/2921
329/3150
329.1/3499
329.2/3880
329.2/3489
329.2/3611
330.1/3503
330.2/3491
330.2/3166
330.2/3630
330.2/3606
331.1/3498
331.1/3459
331.2/3497
331.2/3453
332/2509
332/2537
332.1/3496
333/2514
333.1/3121
333.1/2548
334/2531334.2/2811
335/2785
335/3278
336/2785
336.1/3533
336.2/3192
336.2/2995
337/2537
337/2509
337.1/3650
337.2/3438
338/2511
338/2536338.1/3022
338.1/2789
338.2/3829
338.2/3630
339.1/4124
339.2/3319
339.2/4124
340.1/3328
340.1/3307
340.2/3317
340.2/2758
341.2/3556
343/2684
343/2597343/2662
343.1/3510
343.1/3650
344/2684
344.2/3350344.2/3152
344.2/2940
345/2684
345/3238
345.1/2692
345.2/3388
345.2/3852
345.3/3386
346.1/3499
346.1/3275
346.2/3493
346.9/2782
347.1/3499
347.2/3494
347.2/3669
348.1/3420348.1/3288
348.2/3499
348.2/3430
348.2/3285349.1/3666
349.1/3421
349.1/3633
349.1/3290
349.2/3278
349.2/3665
349.2/3407
350/3385
350.1/3630
350.2/3626
351.1/3499
351.1/3028
351.2/3503
351.2/3485
351.2/3615
351.9/2564
352.1/3498
352.1/2789
352.2/3490
353/2528
353.1/3498353.2/3495
354/2681
354.1/2782
354.1/3671
354.2/3599
355.2/3104
355.2/3591
355.2/3567
356.1/3869
356.2/3830
357.1/3469
357.2/3474
357.2/3833
358/2783
358/3958
358.1/2778
358.2/3832
359/3507
359/3225359.1/3514
359.2/3720
360/2692
360.3/3397
360.8/2915361/2684
361.1/3167
361.1/2693
361.2/3167
361.9/2922
362/2690
362/2664
362.1/3165
362.1/2685
362.2/3156
362.2/3325
363.1/3259
363.1/3481
363.2/3263364.1/3264
364.2/3262364.2/2582
364.2/3655
364.2/3464
364.9/2553
365/2684
365/2595
365/2659
365/3484
365.1/3388
365.2/3404
365.2/3385
366/2684
366/2662
366.1/2788
366.1/3395
366.2/3842367/2666
367.2/3528
367.2/4132367.2/3584
368.1/2792
368.2/3529
368.2/2786
368.2/4133
368.3/3863 369.2/4113370.2/3044
370.2/3918
371.2/3073
371.2/3605
372.1/3626
372.1/3291
375/3071
376.1/3450
376.2/3459
376.2/3722
377/3074
377/3509
377.1/3077
377.2/2997
377.2/3458
378.2/3831
379.1/3327
379.1/3476
379.1/3505 379.2/3336
379.2/3307
380.1/3154
380.1/3203
380.1/3329
380.2/3333
380.2/2971
380.2/3660
381/2686
381/2663
381.1/3154
381.1/3194
381.2/3194
382/2677
382.1/3246
382.2/3788
383.2/3241383.2/4087
384.2/3999
384.3/4003
385.1/3181
385.2/4134
385.2/3174385.2/3449
386/2535
386.1/3177
386.2/3179
386.2/4133
386.3/3577
387.2/3243
388/3959
388.1/2685
388.1/2663
388.1/3651
389/3886389.1/3646
389.2/3349
390.1/3376
390.2/3353 390.3/3050
391.1/3595
391.1/3632
391.1/3611
391.2/3490391.2/3864
391.2/3610
392.1/3604
392.1/3472
392.1/4127
392.1/3495
392.1/3631
392.2/2929
392.2/3616
394.2/3077
395/4121
395.1/3558
395.2/3719
395.2/3674
395.2/3570
396.1/3348
396.1/3560
396.2/3326
396.2/2868
396.2/3862
397/2780
397.1/3315
397.2/3326397.2/3877
397.2/2872
398.3/4059
398.3/3636
399/2530
399.1/2799
400.1/2792
401.1/3334
401.2/2862
402.1/2688
402.1/3337
402.1/2672
402.2/3325
403.1/2950
403.1/3329
403.2/3889
404/2707
404.1/2686
404.3/3900
405/2686
405/3251
405.2/3391
405.3/3940
407.2/3501
408.1/2894
408.1/3494
408.2/2576
410.2/3938
410.3/3944
411.2/3944
411.3/3933
412.2/3944
412.2/3369
412.3/4118
413.1/3634
413.3/4116
414.1/3624
414.2/3062
416.1/2686
416.2/2915
417.1/2683419.1/3062
419.1/3505
419.2/3854
419.2/3062
419.2/3807
419.3/3531
420.1/3786
420.2/2722
421.1/2798
421.1/3265
421.3/3867
422.2/2887
422.3/2572
423.1/3254
423.2/3272 423.2/3253
424.2/2952
424.2/4005
424.2/3155
424.3/4002
425.1/2950
425.2/2955
426.1/3017
426.2/3060
426.9/2692
427/2687
427/2707
427.1/3015
427.3/3269
428.3/3634
429.2/3888
429.2/4080
429.2/3155
429.2/2951
429.2/4031
429.2/4054
429.4/3464
430.1/2686
430.2/3885
430.2/4073
431.1/2685
431.1/3910
431.2/2880
431.2/3884
432.1/2683 432.2/2719
433/3506
433.1/2682433.2/3810
433.2/2703
434.1/3792
436/3126
436.2/2954
436.2/3631
436.2/3596
436.3/2950
437.1/2789
438.1/3455
438.2/3461
438.2/4074
438.3/4058
438.3/3548
439.1/3456
439.2/3459
439.2/3481
439.3/4054
440.1/3150
440.2/3471
440.2/2854
440.2/3160
440.2/4060
440.3/4059
440.3/3089
441.2/3604
441.3/3495
441.3/3050
442.9/2691
443/2684
444/2666
444.1/2686
444.2/2897
444.2/3883444.3/3867
445/2684
445.1/2683
447.2/3877
447.2/3859
448.1/3016
448.1/3549
448.2/3862
448.2/3550
449.1/2895
449.1/3290
449.1/2743
449.1/3534
449.2/2739
450.1/3879
452.1/3076
452.1/2550
452.2/2547
452.2/3146
453.2/3074
453.2/2959
454.1/3291
454.1/3997
454.2/3286
454.2/3088
455.1/3290
455.2/3288
456.1/3288
456.2/3289
457.1/3292
457.2/4112
457.2/3293
458.2/3046
458.2/2941
458.3/2945
458.9/2940
459.1/4137
459.2/3047
459.3/3866
460.1/3452
460.2/3450 460.4/3550
461.1/3542
461.2/3912
461.3/3933
462.1/3094
462.2/3273
462.3/4433
463.2/3046
463.3/3826
464.1/3047
464.2/3468
464.2/3044
465.2/3466
466.1/3203
466.2/3470466.2/3658
466.2/3200
466.2/3695
467.2/3661
467.2/3699
467.3/3460
468.2/3139
468.2/2940468.2/3718
468.3/2941
469.2/3864
471.3/4112
471.9/2941
472/2954
472.2/3121
473.2/2935
473.2/3290
473.2/3138
473.9/3293
474.1/3075
474.4/3722
475.2/3089475.2/2563
475.2/2536
475.2/2787
475.2/2725
475.2/2511
475.2/2689475.2/2665
475.2/2617
475.2/3058475.2/2830
476.1/3290 476.2/2678
477.1/3288
477.2/3293
478/3033
478.1/3162
478.2/3603
478.2/3639
478.4/3022
479.1/3171
479.2/3599
479.3/3630
480.1/3318480.1/3336
480.2/3317
480.2/3099
480.3/4131
481.1/3318
481.2/3314
482.2/3318
482.2/3598
482.3/3201
483.1/3538
483.1/3054
483.2/3598
483.2/3553
484.1/3249
484.1/3588
484.2/3599
484.2/2845
485.1/4136
485.1/3607
485.2/2847
485.2/3041
485.2/3609
485.3/4134
485.3/3057
486/3425
486.1/3464
486.2/2799
486.2/3476
486.2/3457
486.3/4121
487.2/2799
488.2/2882
490.3/3253
491.3/3840491.3/3539
492.2/3643
492.3/3873
492.3/3633
493.1/2882
493.2/3667
494.2/3428
494.2/3120
494.2/3165
495.2/3430495.3/2930
495.3/3417
496.2/3403496.2/3337
496.2/3442
496.2/3363
497.2/3337
497.2/3407
497.2/3363498.2/3444
498.2/3403
498.4/4051
499.3/4112 499.4/4120
500.1/3045 500.2/2798
500.2/3040501.1/3046
501.2/2797
501.2/3493
502.1/3166
502.1/3189
502.1/4192
502.2/3032
502.2/3171
503.1/3166
503.2/3167
504.1/3260
504.1/3164
504.1/3598
504.2/3260
504.2/3031
505.1/3262
505.2/3264
505.4/3699
506.1/3261
506.2/3391
506.2/3258
506.3/3028
506.3/3310507.1/3017
507.2/3030
507.2/3390
508.1/3367
508.2/3528
508.2/3373
508.3/3839
509.1/3030509.2/3528
509.3/3835
510.1/3471
510.2/3527
510.2/3761511.2/3763
512/2971
512.1/3368
512.2/2923
512.2/3761512.2/3127
512.2/3358
512.3/2926
512.3/3127
512.4/4066
513.1/3756
513.2/3365
513.2/2913
513.3/2928
513.3/3522 513.4/3526
515.2/2869
515.9/2929
516.2/3262
517.1/2909
517.1/2937
517.2/2925
517.9/2512
518.2/3409
518.2/3436
518.2/3112
518.3/3522519/3727
520.2/3212
520.2/3268
520.2/3284
520.3/4135
520.4/4140
521.2/3212
521.3/3163
521.3/4137521.4/4138
522.2/3363
522.2/3426
523.1/3037
523.2/3362
523.2/3426
523.2/3459
523.2/3389
523.3/4145
523.4/4140524.1/3157
524.1/2884
524.2/3696
524.2/3624
524.3/2927
525.2/3697
525.2/3162
526.1/3177
526.2/3693
526.2/3180527.1/3178
527.2/3191
527.2/3701
527.3/3532
528.1/3241
528.1/3179
528.2/2837
528.2/3239
528.2/3168
528.3/2835
529.1/3242
529.2/3240
529.2/3170
529.4/2928
530.2/3352
530.2/3541
530.2/3744
531.2/3350
531.2/3514
532.1/3517
532.2/3489
532.2/2870
532.2/3353
532.2/3458
533/2886
533.1/3351
533.2/3488
533.2/2870
533.2/2681
533.2/3459533.2/3343
533.2/3898
533.3/3892
533.3/3171
533.4/4086
534.1/3455
534.2/3583
534.3/3864
534.3/4115
534.3/2667
535.2/3583
535.3/3117
535.3/3867
535.3/3518 535.3/3832 536.1/3940536.2/3565
536.2/3719
536.2/3673
536.3/3419
537/3930
537.2/3564
537.2/3672
537.2/2870
537.4/3174
537.4/4125
538/3520
538.2/3975
538.3/4134
538.3/4098
538.4/4138
538.9/2509539.2/2908
539.3/4135
539.4/4141
540.4/4132
540.4/3698
540.5/4170
541.4/4132
542.3/3209
543.4/3699 544/3714
544.1/3444
544.1/3370
544.2/3207
544.2/3429
544.2/3389
545.2/3201
545.2/3381
545.3/4120
545.5/2646
546.2/3205
546.2/3018
546.2/3701
546.3/4137
546.3/3016
546.3/3657
546.4/4141
547.2/3021
547.2/3683547.2/3206
547.3/3019
547.3/4139
547.4/4141
548.1/3187
548.1/3425
548.2/3020548.3/3015
548.3/4235
548.4/4232
549/3522
549/3262
549.1/3514
549.1/3186
549.3/4128
549.3/3428
550.1/3237550.2/3665
550.2/3604
550.2/3245
550.3/4130
550.3/3600
550.3/3685
550.4/4134
550.4/4115
551.2/3665
551.2/3013
551.2/3032
551.2/3603
551.3/3601
551.3/3025
552.1/3353
552.1/3369
552.2/3350
552.3/3819
553/2545553.1/3357553.1/4164
553.2/3348
553.2/4057553.3/3817
554.1/3215
554.1/2526
554.2/3339
554.2/3215
554.2/3490
556.2/3111
556.2/3444
556.2/3592
556.3/2914
556.3/3118557/3161
557.1/2574
557.2/2916
557.2/3115
557.3/2914
557.3/3121
558/3514
558.1/3526
558.1/3396
558.2/3535
558.2/3392
559.1/3391
559.1/3520
559.2/3393
559.2/3536
560.2/3626
560.2/3096
560.3/3089
561.2/2916
561.3/3635
561.3/2910
561.4/3504
561.9/2539
562.1/3514
562.2/3426
562.2/3740
563/3515
563.4/4133
564.2/3883
564.2/3452
564.2/3481
564.2/3621
564.3/3049
564.3/3511
564.3/3886
564.3/2977
564.4/4140
565.2/3452
565.2/3881
565.3/3479
565.4/4140
565.4/3903
565.4/2649566.2/3573
566.2/3204
566.3/4130
566.3/3595566.4/4231
567.1/3030
567.2/3573
567.2/3016
567.4/4231
567.5/4234
568.2/3207
568.2/3224
568.3/3617
568.3/2910
568.4/2913
568.4/4231
568.8/2627
569.1/3295569.2/3212
569.3/3616
569.3/3594
569.8/2919
570.1/3212570.3/4124
570.4/3509
570.5/3514
571/3508
571.4/4130
571.4/3715
571.4/3677
571.6/2509
571.8/2913
572.1/3676
572.2/3390
572.2/3663
572.2/3417
572.2/3844
572.3/2827
572.4/3932
572.8/2915
573.2/3004
573.2/3420
573.3/3608
573.3/3391
573.8/2921
574.2/4044
574.3/4051
574.8/2915
575.3/4115
575.4/4130
576/3212
576.2/2858
576.2/3663 576.3/2859
576.3/4112
576.4/4123
576.4/3863
577.2/2861
577.3/4136
577.4/4208
578.3/3941
579.1/2789
579.2/2788
579.3/3844
579.4/3510
579.5/3513
580.1/2789
580.2/3574
580.3/3314
580.5/2523
581.1/2789
581.2/2789
581.2/3575
581.2/2861
582.3/3485
582.3/4122
582.4/3827
583.4/4097584.1/3394
584.2/3507
584.3/3371
584.4/4137
585/3503
585.3/3087
586.1/3285
586.2/3279
586.3/3301
587.2/3274
588/3273
588.2/2794
588.3/2789
589.2/2797
589.4/4134
589.5/4402
590.1/3369
590.1/3453
590.2/3214
590.2/3467
590.2/3007
590.3/3005
591.2/3220
591.2/2982
591.3/3005
591.5/4105
592.2/3629
592.2/2998
592.3/3006
592.4/4130
593/3501
593.2/3274
593.3/3260
593.3/4128
593.4/3705
593.4/3672
594/2618
594.1/3671
594.2/3376
594.2/3408
594.3/3619
594.3/3602
594.4/2537
595/2627
595.2/3004
595.3/3016
596/3029
596.2/3008
596.2/3844
596.2/4052596.3/3807
596.4/3803
597.2/4057
597.2/4128597.3/3812
597.3/3000
597.3/3789
597.4/3795
598.3/3704
599.3/4126
599.4/4131
Loadings plot for all variables
01/27/2014
19
PCA (VIII)
37• 0.04 • 0.02 0.00 0.02 0.04
•0.
06•
0.04
•0.
020.
000.
020.
040.
06
Principal Component Analysis (Loadings) Top 25
PC1
PC
2
●● ●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
356.2/3830357.2/3833396.2/3862
349.1/3290301.2/3391
324/3268548.1/3425
300.2/3390
358.2/3832
412.2/3944382.1/3246322.1/3394
523.2/3459
359/3507
460.1/3452
302.2/3389
472/2954533.2/2870
410.2/3938
466.2/3658
323.1/3391
532.2/3458
556.2/3444
362.1/3165
392.1/3631
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
356.2/3830357.2/3833396.2/3862349.1/3290301.2/3391324/3268548.1/3425300.2/3390358.2/3832412.2/3944382.1/3246322.1/3394523.2/3459359/3507460.1/3452302.2/3389472/2954533.2/2870410.2/3938466.2/3658323.1/3391532.2/3458556.2/3444362.1/3165392.1/3631
Loadings plot for the top 25 varaibles
PCA (IX)
38
Scree plot: variance vs. principle component number
01/27/2014
20
PLS-DA (I)
39
• A supervised method to find a predictive model that describes the direction of maximum covariance between a dataset (X) and the class membership (Y)
• Similar to PCA, the original variables are summarized into much fewer new variables using their weighted averages.
• The new variables are called scores.
• The weighting profiles are called loadings.
• PLS-DA can perform both classification and feature selection.
• Feature importance measure: VIP (Variable Importance in Projection)
PLS-DA (II)
40
• Interpretation of the model– R2X and R2Y
• fraction of the variance that the model explains in the independent (X) and dependent variables (Y)
• Range: 0-1
– Q2Y
• measure of the predictive accuracy of the model
• usually estimated by cross validation or permutation testing
• Range: 0-1
• > 0.5 is considered good while > 0.9 is outstanding
01/27/2014
21
PLS-DA (III)
41
• Note of caution– Supervised classification methods are powerful.
– BUT, they can overfit your data, severely.
Machine LearningClustering
Classification
42
01/27/2014
22
Clustering
• Group similar objects together
• Any clustering method requires– A method to measure similarity/dissimilarity between objects
– A threshold to decide whether an object belongs to a cluster
– A way to measure the distance between two clusters
• Common clustering algorithms– K-means
– Hierarchical
– Self-organizing map
• Unsupervised machine learning techniques
43
Hierarchical clustering (I)
1. Find the two closest objects and merge them into a cluster
2. Find and merge the next two closest objects (or an object and a cluster, or two clusters)
3. Repeat step 2 until all objects have been clustered
44
01/27/2014
23
Hierarchical clustering (II)
• Methods to measure similarity between objects– Euclidean, Manhattan
– Pearson correlation
– Cosine similarity
• Linkage: ways to measure the distance between two clusters
45
single complete centroid average
Hierarchical clustering (III)
46
01/27/2014
24
Classification
• Use a training set of correctly-identified observations to build a predictive model
• Predict to which of a set of categories a new observation belongs
• Supervised machine learning
• Methods– Linear discriminant analysis
– Support vector machine (SVM)
– Artificial neural network (ANN)– k-nearest neighbor
– Random forest
– PLS-DA
47
Software PackagesMetaboAnalyst
XCMS
48
01/27/2014
25
MetaboAnalyst 2.0
• This is an online set of statistical tools
– http://www.metaboanalyst.ca/
• First, create an Excel file for each sample containing three columns, the m/z of each metabolite, its retention time and the peak intensity
• Put the files for a group in a folder.
• Then zip the two folders
• On the MetaboAnalyst website, click on “Welcome, click here to start”
Structure of Excel file for submission to MetaboAnalyst
01/27/2014
26
MetaboAnalyst data entry
01/27/2014
27
Initial parameter setting
Initial evaluation
01/27/2014
28
More assessment of the data
Data filtering
01/27/2014
29
Data transformation and scaling
01/27/2014
30
Two‐fold change cutoff
T‐test cutoff, p <0.05
01/27/2014
31
Volcano plot
Correlation matrix
01/27/2014
32
Peaks differentiating Ctrl/OVEX
PCA 2D‐plot
01/27/2014
33
3D PCA plot
PLSDA 2D‐plot
01/27/2014
34
3D‐PLSDA plot
Results from XCMS Online
68
01/27/2014
35
69
For in-depth statistical analysis and data interpretation, please make an appointment with a biostatistician.
70
Thank you!
top related