mikael kubista mikel.kubista@tataa - reference in qpcr www ... · [email protected]....
TRANSCRIPT
Expression profiling
Genes/samples that behave similarly are identified by
their expression patterns
Input data:
1)Expression of genes (2 or more)
2) In many samples (2 or more)
3)As function of time, drug load, genetic make up etc
Multivariate data
Multiway data
B cell lymphoma
The immunoglobulin
light chain constant
region has two versions,
and , that are
expressed in 60 and 40
% of B cells in healthy
individuals.
In Non Hodkin
lymphoma the 60:40
expression ratio is
altered due to clonality
Constant
region
Variable
regions
Kappa & lambda raw data
Classification in scatter plot
14 16 18 20 22 24 26 28 30 32 34 36
12
14
16
18
20
22
24
26
28
30
32
34
36
Num
ber
of Ig
L c
DN
A a
mplif
ications
Number of IgL cDNA amplifications
107
106
105
104
103
102
107
106
105
104
103
102
Num
ber o
f IgL c
DN
A
Number of IgL cDNA
Positive
Positive
Ståhlberg et al., Clin. Chem. 49, 51-59 (2003).
Yeast metabolism
Experimental design
• Four strains of yeast: Wt, Hxt7, Tm6 and Null
• Expression over time after glucose addition: 0 – 60 min
• Expression of genes:
• Genes:
– Ref: BACT, IPPI, PDA
– Glucolysis: TPI, PGK, PDC, ADH1
– Glycogenesis: FBP, MDH2, SUC2, ADH2
– Unknown: ADH3, ADH4, ADH5, ADH6
– HSP, CYC, MIG
Data pre-treatment
1. Correct for off-scale measurements (primer-dimers)
2. Compensate for variations between runs (inter-plate
calibration)
3. Assay efficiency correction
4. Normalize with spike (efficiency variation between samples)
5. Normalize to the same amount of sample
6. Average QPCR technical repeats
7. Normalize with reference genes
8. Average technical repeats
9. Normalize with reference samples (paired test)
10.Calculate relative quantities
11.Convert data to log scale (fold changes)
12.Mean-center/autoscale
Wild type – temporal responses
ADH3-6
Induced
genes
Repressed
genes
HSP12
CYC1
WT – Autoscaled temporal responses
Group 1
Gene1
Gene2
Gene3
Gene2
Gene1
Gene3
The regression line (least squares fit to all points)
is the direction of greatest variance. It defines the
first Principal Component (PC1)
PC1 = C11×Gene1 + C12×Gene2 + C13×Gene3
C11 is the importance of Gene1
in defining PC1 (“loading”)
The samples in the new space
are described by the distances
from the center of PC1 (“score”)
Sample =
Sample(Gene1,Gene2,Gene3)
Gene2
Gene1
Gene3
Often PC1 is not sufficient to describe the data.
PC2 is defined as the vector perpendicular to PC1
that accounts for most of the remaining variance
PC2 = C21×Gene1 + C22×Gene2 + C23×Gene3
One can go on calculating as
many PCs as there are genes
genes. But its not meaningful.
2 or 3 PCs are usually
sufficient to account for the
information in the data, and
such low dimensionality
space is readily visualized
First three score vectors in PCA
PC1 vs. PC2 scores plot (WT)
PC1 vs. PC2 vs. PC3 scores plot (WT)
HXT - HXT7 mutant
HXT – TM6*
HXT - null
Matrix augmentation
Identifying stable reference gene candidates
Optimum number of reference genes
Bias (intergroup variability)
WT TM6 HXT7 Null
TPI -0.8949 0.018 -0.7199 1.5968
SUC 1.152 -0.9102 0.952 -1.1938
PGK -1.8543 -0.1664 -0.3793 2.3999
PDC -1.0355 -0.2102 -0.7105 1.9562
PDA -0.1121 0.0195 0.1379 -0.0454
MIG 0.7051 -0.9445 0.1301 0.1093
MDH 1.6489 0.8617 0.5614 -3.072
IPPI -0.1418 0.1836 0.0582 -0.1001
HSP 0.0614 0.3367 -0.1136 -0.2845
FBP 1.6707 0.6336 0.1207 -2.4251
CYC -0.1418 -0.2664 0.1332 0.2749
ACT1 -0.3199 0.0555 0.0301 0.2343
ADH6 -0.248 -0.2977 -0.073 0.6187
ADH5 -0.6824 0.3305 0.2676 0.0843
ADH4 -0.2761 -0.1758 -0.0636 0.5155
ADH3 0.1364 0.2617 0.1989 -0.597
ADH2 1.0114 0.4367 0.4364 -1.8845
ADH1 -0.6793 -0.1664 -0.9668 1.8124
OK
OK
OK
Augmented PC1 vs. PC2 scores plot
Color = presumed function
Symbol = strain
Augmented hierarchical clustering
Forced Self Organized Map
FBP1, ADH2,
MDH2 (all
strains), SUC2
(HXT7) HSP12
(HXT7 and TM6*),
ADH3 (WT and
HXT7) and ADH5
(WT)
SUC2 (TM6*), HSP12
(WT) and ADH4 (HXT7)
ADH1, PDC1,
TPI1 (all
strains), MIG1
(WT and HXT7),
PGK1 (WT),
ADH4 (TM6*)
CYC1 (all
strains), SUC2
(WT), ADH5
(HXT7 and
TM6*), ADH3
(TM6*)
PGK1 (TM6*)
MIG1 (HXT7),
PGK1 (HXT7),
ADH4 (WT),
ADH6 (all
strains)
Behavior of the genes in three strains
Acknowledgements
Gothenburg University
Anders Ståhlberg
Karin Elbing
Lawrence Livermore Laboratory
Björn Sjögreen
Institute of Biotechnology
Radek Sindelka
Vlasta Ctrnacta
David Svec
Vendula Rusnakova
Razi University
Jahan Ghasemi
A Coruna University
José Manuel Andrade
Ales Tichopad
TATAA Molecular Diagnostics
Ales Tichopad
MultiD Analyses
Amin Forootan
Daniel Lindh
Anders Bergkvist
Symposium in Prague
Developments in Gene Expression
Profiling
May 25-28, 2009
(www.qpcrsymposium.eu)