digital pcr for copy number analysis - qbase+ · 2016-09-13 · digital pcr for copy number...
TRANSCRIPT
Digital PCR for copy number analysis
Jo Vandesompele, PhD
Biogazelle CSO, UGent professor
EMBL Advanced Course Digital PCR, Heidelberg, Germany
October 22, 2015
Acknowledgements (A-Z)
Lieven Clement, Els Goetghebeur, Bart Jacobs, Peter Pipelers, Olivier Thas, Matthijs Vynck
Steve Lefever, Björn Menten, Katrien Vanderheyden, Kimberly Verniers, NurtenYigit
Ariane De Ganck, Nele Nijs
Xavier Alba, Jen Berman, Frank Bizouarn, Viresh Pattel, Svilen Tzonev
Agenda
• introduction
• experiment design
• power analysis
• sensitivity vs. inhibition vs. availability of input
• CNV use cases
• advanced data-analysis
• droplet classification
• combining replicates & multigene normalization
• tips & tricks
Full text papers available on Biogazelle website
http://www.biogazelle.com > Knowledge center > publications
Biogazelle blog on dPCR vs. qPCR
http://www.biogazelle.com/knowledge-center/blog
Digital PCR is emerging as gold standard method for CNV
• Biogazelle is reference lab for Bio-Rad’s QX100/200 droplet digital PCR technology
• Scalable precision and relative sensitivity (needle in the haystack) (“more is better”)
• High accuracy (without calibration)
• Excels in quantification of small differences and rare events
Application domains
• in principle any nucleic acid quantification study (cost/throughput)
• focus on those areas where dPCR excels
• small differences
• CNV analysis (high copy number range, transgene stability testing, cell-free DNA (NIPT, oncogene amplification)
• gene expression (microRNA, splice variants)
• rare events
• pathogens (e.g. viral load in body fluid such as urine)
• mutant cancer cells (tissue, circulating cells or cell-free DNA)
• circulating RNA biomarker (cell-free RNA)
dMIQE guidelines for digital PCR
• Clinical Chemistry, 2013
• co-authored by Biogazelle founders
dMIQE guidelines have 3 goals
1. Design, perform, and report dPCR experiments that have greater scientific integrity
2. Facilitate replication of published experiments adhering to the guidelines
3. Provide critical information that allows reviewers and editors to assess the technical quality of manuscripts
Power analysis is a crucial aspect of experiment design
• Ensure proper setup to find a true difference with statistical significance
• Often ignored
• Limitations of dPCR power analysis in literature
• no or few details on the methods
• no incorporation of replicate variability (instead, reactions are (naively) pooled over replicates)
• not taking into account of all variables (e.g. replicates, fraction of negative droplets, …)
• use of meta-analysis methods (instead of ad hoc statistical method)
Digital PCR power analysis is a function of
• true difference you want to see
• number of partitions
• fraction of negative partitions
• number of replicates
• alpha value (type I error, false positive rate, 5%)
• 97% power to detect a 10% difference in copy number using 3 replicated reactions of each 14,000 partitions with 30% negative partitions
• 53% for a 5% difference
Interactive tool to determine power in digital PCR experiments
• power for a given condition
• power ~ number of replicates
~ fraction of negative partitions
~ number of partitions
~ copy number difference
• optimal negative fraction (for max power) ~ copy number difference
• Vynck et al., in preparation
http://vandesompelelab.ugent.be/power/
Power in function of fraction of negative partitions
http://vandesompelelab.ugent.be/power/
• difference of 10%• 14,000 partitions• 3 replicates
Power in function of number of replicates
http://vandesompelelab.ugent.be/power/
• difference of 10%• 14,000 partitions• 95% negatives
Power in function of number of partitions
http://vandesompelelab.ugent.be/power/
• difference of 15%• 1 replicate• 30% negatives
What is determining the sensitivity of dPCR?• Both qPCR and dPCR can detect 1 molecule (precision is higher
for dPCR at low concentrations)
• Input amount of nucleic acids
• more cDNA to detect a low abundant transcript (e.g. long non-coding RNA)
• more circulating cell-free DNA to detect a low frequent mutation
intended&sensitivity ng&of&DNA&needed10.000% 0.2291.000% 2.2860.100% 22.8570.010% 228.5710.001% 2285.714
assuming at least 5 positive droplets are needed for confident calling, a perfectly discriminating assay between wild type and mutant, 14,000 recovered droplets from 20,000 formed
Large dynamic range, high precision and accuracy
• Correlation between expected and measured concentrations on a gDNA dilution series (ranging from 100 000 copies/reaction to 5 copies/reaction) (320 ng – 16 pg DNA)
y = 0.9781x + 0.0695R² = 0.99877
0
1
2
3
4
5
6
0 1 2 3 4 5 6
log
10 (
me
asu
red
co
nc
en
tra
tio
n)
co
pie
s/d
dPC
R re
ac
tio
n
log10 (expected concentration)copies/ddPCR reaction
Unpurified digested genomic DNA inhibits ddPCR if > 30 v/v%
y = 1.143x + 3.224R² = 0.990
3.0
3.5
4.0
4.5
5.0
5.5
0.6 0.8 1.0 1.2 1.4 1.6 1.8
log
10 (
me
asu
red
co
nc
en
tra
tio
n)
co
pie
s/re
ac
tio
n
log10 (gDNA concentration)v/v%
25
57.5
1015
2030
cDNA inhibits ddPCR if > 25 v/v%
• Influence of cDNA input amounts (ranging from 5 to 45 v/v%) on measured concentration
y = 0.921x + 3.306R² = 0.999
0.0
1.0
2.0
3.0
4.0
5.0
6.0
0.6 0.8 1.0 1.2 1.4 1.6 1.8
log
10 (
me
asu
red
co
nc
en
tra
tio
n)
co
pie
s/re
ac
tio
n
log10 (cDNA concentration)v/v%
510
15 20 25
Case 1 – genetic characterization of cell banks
• Therapeutic protein production in biopharmaceutical industry
• Transgene copy number has influence on expression level
• Need for a cell line that is genetically stable throughout the biopharmaceutical manufacturing process
• Genetic characterization of Master Cell Bank (MCB) and Working Cell Bank (WCB)
• Traditionally by Southern blot analysis -laborious and time consuming
• > qPCR method for transgene copy number determination
Case 1 – struggling with qPCR
• Transgene copy number analysis
• Limited accuracy at higher copy numbers
• Compensated by including more PCR replicates and calibrators
(D’haene et al., Methods, 2010)
• Pilot study: synthetic CN series (1-10 copies) measured with 16 qPCR replicates
• Resampling to investigate impact of increased number of replicates & calibrator samples
• Conclusion
• 8 qPCR replicates and 3 calibrator samples are required for CN analysis at increased copy numbers
• Still relatively large deviation from expected copy number in proof of concept study
S1 S2 S3 S4 S5 S6 S7 S8
Case 1 – proof of concept 1
• Copy numbers from duplex assay – gene 1 (performed in triplicate)
• observed normalized copy numbers tightly agree with expected integer copies
expected CN: 0 0 1 2 3 4 5 5
Co
py n
um
be
r
Case 1 – proof of concept 2
• Copy numbers from duplex assay – gene 2 (performed in triplicate)
• deviation from expected integer copies for samples 3 and 4
S1 S2 S3 S4 S5 S6 S7
expected CN: 1 1 4 4 3 0 1
Co
py n
um
be
r
Case 1 – getting integer copy numbers with ddPCR
• Copy numbers from duplex assay – gene 2 (XbaI restriction digest)
• Restriction digest is required to properly count linked loci (here: tandem repeats)
S1 S2 S3 S4 S5 S6 S7
expected CN: 1 1 4 4 3 0 1
Co
py n
um
be
r Restriction digest
Case 1 - ddPCR versus qPCR
• ddPCR has higher accuracy than qPCR
• 3.1 x lower standard deviation on log2 copy numbers
• 2.3 x smaller fold changes between max and min copy number
• Less reactions required for ddPCR than for qPCR
• ddCPR requires no external standard or calibrator sample with known copy number
0.00#
1.00#
2.00#
3.00#
4.00#
0.00# 1.00# 2.00# 3.00# 4.00# 5.00#ddPCR
%
qPCR%
M4%gene%copy%number%
qPCR
dd
PC
R
Case 1 – ddPCR based genetic characterization of cell banks
• Copy number
• 24 samples – WCB
• Duplex assay – gene 1
• Expected CN: 5
• Deviation from expected CN
• Average: 0.11
• Standard deviation: 0.078
Co
py
num
be
r
01_W
CB
02_W
CB
03_W
CB
04_W
CB
05_W
CB
06_W
CB
07_W
CB
08_W
CB
09_W
CB
10_W
CB
11_W
CB
12_W
CB
13_W
CB
14_W
CB
15_W
CB
16_W
CB
17_W
CB
18_W
CB
19_W
CB
20_W
CB
21_W
CB
22_W
CB
23_W
CB
24_W
CB 0
0.05
0.1
0.15
0.2
0.25
0.3
01_W
CB
02
_WC
B
03_W
CB
04
_WC
B
05_W
CB
06
_WC
B
07_W
CB
08
_WC
B
09_W
CB
10
_WC
B
11_W
CB
12
_WC
B
13_W
CB
14
_WC
B
15_W
CB
16
_WC
B
17_W
CB
18
_WC
B
19_W
CB
20
_WC
B
21_W
CB
22
_WC
B
23_W
CB
24
_WC
B
01_W
CB
02_W
CB
03_W
CB
04_W
CB
05_W
CB
06_W
CB
07_W
CB
08_W
CB
09_W
CB
10_W
CB
11_W
CB
12_W
CB
13_W
CB
14_W
CB
15_W
CB
16_W
CB
17_W
CB
18_W
CB
19_W
CB
20_W
CB
21_W
CB
22_W
CB
23_W
CB
24_W
CB
De
via
tio
n
Case 1 – ddPCR based genetic characterization of cell banks
• ddPCR is very well suited for transgene copy number determination
• Genetic characterization of cell banks for therapeutic protein production
• Transgene copy number analysis in genetically modified (GM) crop research
• Transgenic animal models
• Remark: qPCR is the standard approach in biopharmaceutical industry – will take some time to adopt ddPCR
Case 2 – clinical genetics application
• Detection of chromosomal aneuploidies
• Proof of concept on post-natal samples
• Future: non-invasive prenatal testing (NIPT)
• Challenge to achieve accuracy and precision required to quantify fetal copy numbers in prenatal samples based on low level fetal cfDNA in maternal blood (median amount of 10%)
Case 2 – assay design and validation
• Design of assays for a number of loci on chromosomes for which copy number variations are most often found
• Chromosome 21 (e.g. trisomy 21 or Down syndrome)
• Chromosome 13 (e.g. trisomy 13 or Patau syndrome)
• Chromosome 18 (e.g. trisomy 18 or Edwards syndrome)
• Chromosome X & Y (e.g. Turner syndrome)
• Empirical validation using qPCR
• Standard curve (dilution series) à efficiency QC
• Gel electrophoresis à specificity QC
Case 2 – assay design and validation
• Design of assays for a number of loci on chromosomes for which copy number variations are most often found
• Chromosome 21 (e.g. trisomy 21 or Down syndrome)
• Chromosome 13 (e.g. trisomy 13 or Patau syndrome)
• Chromosome 18 (e.g. trisomy 18 or Edwards syndrome)
• Chromosome X & Y (e.g. Turner syndrome)
• ddPCR
• Chromosome specific assays (hydrolysis probe - FAM)
• Reference assay (RPP30 – VIC)
• Gradient PCR à standard protocol is suitable
• gDNA dilution series
• CNV duplex – 3 replicates
Case 2 – copy numbers of control samples
Control 1 Control 2
Control 3 Control 4
female
male
female
male
A-1
3q
B-1
3q
A-1
8p
A-1
8q
B-1
8q
A-2
1q
B-2
1q
A-X
p
A-X
q
B-X
q
A-Y
p
B-Y
p
2.5
2
1.5
1
0.5
0
A-1
3q
B-1
3q
A-1
8p
A-1
8q
B-1
8q
A-2
1q
B-2
1q
A-X
p
A-X
q
B-X
q
A-Y
p
B-Y
p
2.5
2
1.5
1
0.5
0
A-1
3q
B-1
3q
A-1
8p
A-1
8q
B-1
8q
A-2
1q
B-2
1q
A-X
p
A-X
q
B-X
q
A-Y
p
B-Y
p
2.5
2
1.5
1
0.5
0
2.5
2
1.5
1
0.5
0
A-1
3q
B-1
3q
A-1
8p
A-1
8q
B-1
8q
A-2
1q
B-2
1q
A-X
p
A-X
q
B-X
q
A-Y
p
B-Y
p
Case 2 – copy numbers of cases
Case 5
Case 9
Case 18
femaleTurner
trisomy 21
male
trisomy 18
male
A-1
3q
B-1
3q
A-1
8p
A-1
8q
B-1
8q
A-2
1q
B-2
1q
A-X
p
A-X
q
B-X
q
A-Y
p
B-Y
p
C-2
1q
2.5
2
1.5
1
0.5
0
3.5
3
A-1
3q
B-1
3q
A-1
8p
A-1
8q
B-1
8q
A-2
1q
B-2
1q
A-X
p
A-X
q
B-X
q
A-Y
p
B-Y
p
C-2
1q
2.5
2
1.5
1
0.5
0
3.5
3
2.5
2
1.5
1
0.5
0
3.5
3
A-1
3q
B-1
3q
A-1
8p
A-1
8q
B-1
8q
A-2
1q
B-2
1q
A-X
p
A-X
q
B-X
q
A-Y
p
B-Y
p
C-2
1q
Case 2 – proof of concept on post-natal samples
• ddPCR is great for copy number analysis in majority of samples
• Non-integer copy numbers may be observed in difficult samples
• Accuracy and precision need improvements to allow for NIPT
• ultrashort amplicons
• improved cell-free DNA isolation method (300-1000 alleles from 2 ml of plasma)
• multigene normalization (also for gene expression!)
Case 2 – optimization experiment design
• Standard CNV protocol – duplex normalization
• Triplicate ddPCR reactions
• 14 duplex reactions
• Each reaction contains one locus of interest (FAM) to be normalized with reference locus (VIC)
• Normalization against reference locus copy number in the same reaction
Case 2 – optimization experiment design
• Improved CNV protocol – multigene normalization
• Triplicate ddPCR reactions
• 7 duplex reactions
• Each reaction contains a FAM labeled assay and a HEX labeled assay (à HEX as alternative to VIC (Zen / Iowa Black double quencher probes from IDT)
• No a priori selection of reference gene locus
• Normalization against all other autosomal chromosomes with normal diploid copy number
geNorm - multigene normalization
• geNorm – cited more than 8000 times
Vandesompele et al., Genome Biology, 2002
Case 2 – multigene normalization
• Average deviation from integer copy numbers between different normalization strategies
deviation from integer CN
multigene normalization RPP30 normalization
Ca
se 5
Ca
se 6
Ca
se 9
Ca
se 1
6
Ca
se 1
9
Ca
se 2
0
Co
ntr
ol 4
Co
ntr
ol 3
Co
ntr
ol 2
Co
ntr
ol 1
0.000
0.010
0.020
0.030
0.040
0.050
0.060
0.070
0.080
multigene normalization
RPP30normalization
Average 0.015 0.037
SD 0.008 0.025
Case 2 – optimization experiment design
• Results show that normalization using other autosomes improves accuracy of copy numbers
• Normalization based on absolute autosomal counts reduces running cost by 50%
Advanced digital PCR data-analysis
• Vynck et al., submitted
• GLMM framework (R and Shiny web app)
• handles replicate wells
• multiple reference gene normalization
• automatic selection and application of stable reference genes
20 samples, 3 replicates each, ~ 14,000 droplets, negative fraction 80-90%, 95% CI
Results from oncogene detection in cell-free DNA from plasma
0"
0.5"
1"
1.5"
2"
2.5"
3"
3.5"
4"
4.5"
5"
W95" X1802" X2311" K578" R611" W17" X2323" X2545" Z198" S571" X2601" K585" S130" S494" X1314" X1562" X1659" X2597" S638" X1987"
2.0
3.0
1.0
• in 8/10, there was a perfect agreement on oncogene amplification status
• in 2/10, there is no agreement• fresh frozen is only marginally elevated (tumor
heterogeneity)• tumor DNA 2.068 (95% CI 2.017-2.121) > elevated• cfDNA 2.009 (95% CI 1.933-2.089) > normal
Comparison of plasma cfDNA and fresh frozen tumor DNA
More narrow CI with proper statistical processing of replicates
0.25%
0.5%
1%
2%
4%
8%
16%
32%
64%
128%
256%
512%
1024%
0.25% 0.5% 1% 2% 4% 8% 16% 32% 64% 128% 256% 512% 1024%
meta-analysis
GLM
M
More narrow CI with GLMM statistical processing of replicates
1"
2"
4"
1" 2" 4"
3:4 copies oncogene:reference gene (tumor)
cfDNA without evidence of oncogene amplification
cfDNA with signs of oncogene amplification (p<0.05)
meta-analysis
GLM
M
• Jacobs et al., BMC Bioinformatics, 2014
Partition misclassification has largest impact on accuracy and precision
Interactive tool to inspect sources of variance on absolute quantification
http://users.ugent.be/~bkjacobs/dPCR_VarComp/index.html
• Stochastic clustering approach that matches the intuition
• Using the raw data from the QX100
• Multistep approach• cluster center location (expectation maximization)• remove the rotation • univariate projection on each channel• robustly fit a normal null distribution on the negative peak• calculate the posterior probability to be negative with respect
to the channel for each droplet• combine both channels
Development of a framework for objective partition classification
Find cluster centers and remove rotation
Fit the null distribution and calculate posterior probability of the negatives
no rain rain
• red = fitted distribution of the negatives
• black = entire distribution
• probability negative droplet = red/black in the projected point of the droplet
Combine channels and label clusters based on max probability
Gene copy number quantification on digested high quality DNA
Inhibition due to cDNA carryover
Oncogene amplification in cfDNA
Single channel data for low concentration target
Single channel data for low concentration target
• better dealing with outlier droplets with lower than negative amplitude (deviating droplet volumes?)
• use combined estimated distribution of no template reactions instead of theoretical normal distribution
Work in progress
General conclusions (1)
• ddPCR is a great tool for copy number analysis
• no need for reference sample with known copy number
• better accuracy and precision compared to qPCR
• Points of attention
• restriction digest is required to quantify linked loci (e.g. tandem repeats)
• Remaining challenges
• non-integer copy numbers for difficult samples
• further improve accuracy and precision to meet NIPT requirements (for instance smaller amplicon size)
General conclusions (2)
• Power analysis is important (and easy)
• interactive tool
• Mathematical framework for combining replicates, selecting reference genes, and multigene normalization
• latent variable, complementary log-log link, GLMM
• Vynck et al., submitted
• Statistical framework for automated (objective) droplet classification
• Jacobs et al., work in progress
Tips & tricks
Template input (1)
• ~1 copy per droplet (CPD) (highest precision is at 1.59)
• range of 1-100 000 copies / 20 µl ddPCR reaction
• 0.00005 - 5 CPD
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
95
% c
onfi
den
ce in
terv
al f
ract
ion
fraction positive droplets
0.11 0.22 0.36 0.51 0.69 0.92 1.20 1.61 2.30
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
copies per droplet
1 well (20,000 droplets)3 wells merged
Template input (2)
• maximum 25 v/v% unpurified digested gDNA or undiluted cDNA to prevent inhibition (test using your own reagents)
• DNA digest is required for gene copy number analysis, especially for linked loci (not required for FFPE and cell-free DNA)
• integrity of DNA/RNA is as important for dPCR as for qPCR
• Vermeulen et al., Nucleic Acids Research, 2011
ddPCR assay design guidelines
• in house primerXL design pipeline
• primer3 based
• avoid SNPs (Lefever et al., Clinical Chemistry, 2013)
• avoid secondary structures (UNAFold)
• assess specificity (BiSearch / Bowtie)
• target: FAM-IBFQ, reference HEX-IBFQ
• amplicon length <70 nt if possible
• primer Tm: 61-63 °C
• probe Tm: 64-68 °C (65 opt)
• probe length: 14-25 nt (18 opt)
• HaeIII-compatible amplicons
Separation of + and - droplets dependon amplicon & probe length
• amplicons >100 bp, positive intensities drop
• rise in negatives as probe length increases (> 25 nt)
Gradient PCR allows selection of optimal annealing temperature
• gradient from 55-65 °C
• optimal Ta, specificity check
Duplex test validation
• same quantification result as in singleplex
• orthogonal droplet clusters in 2D plot
• orthogonality of duplex assay can be improved by
• Tm matching between target and reference assay
• Droplet PCR Supermix (#186-3023) (adding more resources)