genetische anfälligkeit für krebs - ei a l h lbeine ... · genetische anfälligkeit für krebs -...

Genetische Anfälligkeit für Krebs Ei A l H lb d- Eine Analyse von Halb- und

VollgeschwisterdatenVollgeschwisterdaten

Melanie Bevier, Marianne Weires, Jan Sundquist, Kari Hemminki

Deutsches Krebsforschungszentrum (DKFZ) Heidelberg

Übersicht

• Genetische Modelle (Zwillingsstudien)

• Analysis of Twin Data Using SAS y g(Rui Feng et al., Biometrics, June 2009)

• Anwendung des ACE-Modells auf Halb- und Vollgeschwisterdateng

3/25/2011 | Melanie Bevier Molecular and Genetic Epidemiology (C050)

Vererbbarkeit

• ZwillingsstudienZwillingsstudien

• werden angewandt, um die Vererbbarkeit eines Merkmals, aufgegliedert nach genetischen und umweltbedingten Einflüssen, zu analysiereng g , y

• Monozygote Zwillinge: teilen 100% der Gene

• Dizygote Zwillinge: teilen ca. 50% der Gene

• Vererbbare Merkmale: höhere Korrelation in monozygoten als in dizygoten Zwillingen wird erwartet

=> diese höhere Übereinstimmung bei monozygoten Zwillingen wird zur Bestimmung des genetischen Einflusses verwendet


Vererbbarkeit

• Varianzkomponenten Modelle werden zur Analyse• Varianzkomponenten-Modelle werden zur Analyse genutzt

=> sichtbare phänotypische Varianz wird aufgeteilt in additiv-genetische (A), dominant-genetische (D), geteilte Umwelteinflüsse (C) und zufällige Umwelteinflüsse (E)Umwelteinflüsse (E)

• Vererbbarkeit (Heritabilität):( )

)()()()()()(²

EVarCVarDVarAVarDVarAVarh

++++

=)()()()( EVarCVarDVarAVar +++


Vererbbarkeit

Software zur Analyse für Zwillingsdaten:• Software zur Analyse für Zwillingsdaten:

• zum Beispiel LISREL Mx Mplus• zum Beispiel LISREL, Mx, Mplus


Genetische Modelle• Genetisches Modell:

Kovariable dominant-genetische Effekte

yij = Xijβ + aij + dij + ci + εij,verbleibende Effekte

additiv-genetische Effekte

geteilte Umwelteinflüsse

=> ADE- oder ACE-Modelle


Genetische Modelle für binäre Merkmale• Annahme: Y ist binär (z.B.: Analyse von Krebs: erkrankt oder

gesund)g )

• Probit-Modell zur Anpassung der Daten:

• probit(Pr(yij=1)) = Xijβ + aij + dij + ci ,

probit( ) Inverse der kumulierten Standardnormalverteilung• probit(.)= Inverse der kumulierten Standardnormalverteilung


Anwendungsbeispiel: Zwillingsdateng p g

⎟⎟⎞

⎜⎜⎛ 11

proc iml;covA1=sqrsym({1,1,1});gA1=t(root(covA1));covA2=sqrsym({1,0.5,1});gA2=t(root(covA2));

⎟⎟⎠

⎜⎜⎝ 11

⎟⎟⎠

⎞⎜⎜⎝

⎛0101

gA2=t(root(covA2));covD1=sqrsym({1,1,1});gD1=t(root(covD1));covD2=sqrsym({1,0.25,1});gD2=t(root(covD2));

⎠⎝ 01

/* for additive genetic effect, the G design matrix determines the model throughthe model covariance structure: VAR[A]=GZG`. The mixed model is then representedas: logit(p{y=1}) = mu + ZG + e */

g1=INSERT(gA1,gD1,0,3); /*combine matrix for MZ twin*/g2=INSERT(gA2,gD2,0,3); /*combine matrix for DZ twin*/

⎟⎞

⎜⎛ 0101

⎟⎟⎠

⎞⎜⎜⎝

⎛0101



use one1;read all var {FAMID zygo} into vzygo;

close one1;nsub=nrow(vzygo);

⎟⎟⎞

⎜⎜⎛ 0101

g=j(nsub,4,.);do i=1 to nsub;

ind=2-mod(i,2);if(vzygo[i,2]=1) then g[i,]=g1[ind,]; /* MZ twin*/if(vzygo[i,2]=0) then g[i,]=g2[ind,]; /* DZ twin*/

⎟⎟⎠

⎜⎜⎝ 0101

( yg [ , ] ) g[ ,] g [ ,];end;cname={"aone" "atwo" "done" "dtwo"};create rmatrix1 from g[colname=cname];append from g;close rmatrix1; ⎟⎟

⎠

⎞⎜⎜⎝

⎛970250870500101

close rmatrix1;quit;

⎟⎠

⎜⎝ 97.025.087.05.0


Anwendungsbeispiel: Zwillingsdaten

title1 'ADE model';

g p g

Startwerte mit Hilfe von proctitle1 ADE model ;

proc nlmixed data=two2 ;*you need to choose your own initial values;*it may be useful to run a simple model such as PROC LOGISTIC;*usually you may need to try-and-err;

b t 1 0 192 b t 2 0 088 b t 3 0 0011 b t 4 0 889 b t 5 0 230 b t 6 1 092 1 4

Hilfe von proc logistic schätzen

parms beta1=-0.192 beta2=0.088 beta3=0.0011 beta4=0.889 beta5=0.230 beta6=-1.092 s1=4s2=0.01;*include the fixed covariates in the next statement;fixed1 = beta0+beta1*SEX+beta2*GA+beta3*BW+beta4*inst2+beta5*inst3+beta6*RDS;*the following statement specifies additive (a1 and a2) and;*dominant (d1 and d2) random effects;random1 = a1*aone + a2*atwo + d1*done + d2*dtwo;eta = fixed1 + random1;*specifies a probit model;p = 1 - probnorm(eta);

Probit model mit zufälligen Effekten spezifizieren

p 1 probnorm(eta);model BPD ~ binary(p);random a1 a2 d1 d2 ~ normal([0,0,0,0],[s1,0,s1,0,0,s2,0,0,0,s2]) subject=FAMID;estimate 'icc' (s1+s2)/(1+s1+s2);run;

Heritabilität



title2 'ACE model';proc nlmixed data=two2;parms beta0=2.793 beta1=0.133 beta2=-0.095 beta3=-0.0011 beta4=-0.929 beta5=-0.283b t 6 1 112 1 3 2 1beta6=1.112 s1=3 s2=1;fixed1 = beta0+beta1*SEX+beta2*GA+beta3*BW+beta4*inst2+beta5*inst3+beta6*RDS;*the following statement specifies additive (a1 and a2) and;*common environment (c) random effects;random1 = a1*aone+a2*atwo+c;

G t ilt U lt i flü i deta = fixed1 + random1;p = 1 - probnorm(eta);model BPD ~ binary(p);random a1 a2 c ~ normal([0,0,0],[s1,0,s1,0,0,s2]) subject=FAMID;estimate 'icc' s1/(1+s1+s2);

Geteilte Umwelteinflüsse sind gleich für Zwillingspaare

estimate icc s1/(1 s1 s2);run;

Heritabilität


Ziel der Analysey

• Genetische Modelle (ADE- oder ACE-Modell) zur Bestimmung genetischer Vererbbarkeit von Krankheiten bisher meist anhand von Zwillingsdaten analysiert

• Zwillingsdaten sehr selten und ergeben kleine Datensätze

• Ziel: Geschwister und Halbgeschwister zu analysieren um mit i öß D t V bb k it K b häteiner größeren Datenmenge Vererbbarkeit von Krebs schätzen

zu können


Daten

• Schwedische Familien Krebsregister Datenbank• Schwedische Familien-Krebsregister-Datenbank

• Analyse aller Frauen die• Analyse aller Frauen, die• eine Schwester und keine Halbgeschwister haben• eine Halbschwester mütterlicherseits und keine Geschwister

haben

A l fü B tk b dj ti t h Alt d• Analyse für Brustkrebs adjustiert nach Alter und sozioökonomischem Status


Übersicht der Daten

full siblings (n = 588,568) half siblings (n = 54,262) p * Signifikanterage, n (%) <.00010 ‐ 4 52251 (8.88) 3875 (7.33)5 ‐ 9 51547 (8.76) 4297 (7.92)10‐14 49840 (8.47) 5315 (9.80)

Signifikanter Unterschied in Altersstruktur der beiden Gruppen15‐19 44695 (7.59) 5662 (10.43)

20‐24 37828 (6.43) 5136 (9.47)25‐29 40629 (6.90) 5198 (9.58)30‐34 45610 (7.75) 5403 (9.96)

beiden Gruppen => Gruppen werden nach Alter

li h30 34 45610 (7.75) 5403 (9.96)35‐39 46166 (7.84) 4792 (8.83)40‐44 42105 (7.15) 4422 (8.15)45‐49 35057 (5.96) 3107 (5.73)50 54 34643 (5 89) 2598 (4 79)

angeglichen

50‐54 34643 (5.89) 2598 (4.79)55‐59 37001 (6.29) 2049 (3.78)60‐64 35911 (6.10) 1266 (2.33)65‐69 22029 (3.74) 668 (1.23)70+ 13257 (2.25) 374 (0.69)

* Wilcoxon rank sum test


Übersicht der Daten

full siblings (n = 52,476) half siblings (n = 52,476) p*period n (%) < 0001period, n (%) <.0001< 1986 2218 (4.23) 853 (1.63)1986‐1990 1181 (2.25) 379 (0.72)1991‐1995 1798 (3.43) 620 (1.18)1996 2000 1975 (3 76) 991 (1 89)1996‐2000 1975 (3.76) 991 (1.89)> 2001 45301 (86.33) 49630 (94.58)region, n (%) 0.51Big cities 13368 (25.48) 13603 (25.92)Southern region 13436 (25.61) 13176 (25.11)Northern region 7439 (14.18) 8352 (15.92)Other 18230 (34.74) 17342 (33.05)sei, n (%) 0.003se , (%) 0 003Agricul 80 (0.15) 52 (0.10)Worker 7342 (13.99) 9177 (17.49)BlueCol 7890 (15.04) 5976 (11.39)Profess 1488 (2 84) 779 (1 48)Profess 1488 (2.84) 779 (1.48)Private 380 (0.72) 351 (0.67)Other 35293 (67.26) 36138 (68.87)

* l k


* Wilcoxon rank sum test

Beispiel: Datensatz zu Brustkrebsp

data breast2;set breast;id=_n_;run;


Logistische Regressiong g

bl l f bTable: Logistic regression analysis for breast cancer.

Estimate Standard error 95% CI p‐value

Age ‐0.0360 0.00295 ‐0.03895 to ‐0.03305 <.0001

Worker reference

BlueCol ‐0.4501 0.0792 ‐0.5293 to ‐0.3702 <.0001

Profess ‐0.4601 0.1345 ‐0.5946 to ‐0.3256 0.0006

Private ‐0.7199 0.1924 ‐0.9123 to ‐0.5275 0.0002

Other 1.3575 0.1202 1.2373 to 1.4777 <.0001


Kovarianz-Strukturen• Kovarianz zwischen Schwestern:

• cov(ai1,ai2) = ½ σa²

• Kovarianz zwischen Halbschwestern:

• cov(ai1,ai2) = ¼ σa²


ACE-Modell (SAS-code)( )

proc iml;covA1 sqrsym({1 0 5 1});covA1=sqrsym({1,0.5,1}); gA1=t(root(covA1)); /* Schwestern */covA2=sqrsym({1,0.25,1});gA2=t(root(covA2)); /* Halbschwestern*/use breast2;

read all var {fam hsm} into fam4;close breast2;nsub=nrow(fam4);g=j(nsub,2,.);do i=1 to nsub;

⎟⎟⎠

⎞⎜⎜⎝

⎛87.05.001

;ind=2-mod(i,2);if(fam4[i,2]=1) then g[i,]=gA1[ind,]; /* Schwestern */if(fam4[i,2]=0) then g[i,]=gA2[ind,]; /* Halbschwestern*/

end;cname={"aone" "atwo" }; ⎞⎛ 01cname={ aone atwo };create rmatrix from g[colname=cname];append from g;close rmatrix;quit;

⎟⎟⎠

⎞⎜⎜⎝

⎛97.025.001


ACE-Modell (SAS-code)

d i

( )

data rmatrix;set rmatrix;id=_n_;run;data two;merge breast2 rmatrix;by id;run;


ACE-Modell (SAS-code)( )Startwerte mit proc logistic

title 'ACE model';proc nlmixed data=two tech=congra;parms beta1=-0.0360 beta2=-0.4501 beta3=-0.4601 beta4=-0.7199 beta5=1.3575;fixed1 = beta0+beta1*age+beta2*sei3+beta3*sei4+beta4*sei5+beta5*sei6;

ggeschätzt

fixed1 beta0 beta1 age beta2 sei3 beta3 sei4 beta4 sei5 beta5 sei6;random1 = a1*aone+a2*atwo+c;eta = fixed1 + random1;p = 1 - probnorm(eta);model code ~ binary(p);random a1 a2 c normal([0 0 0] [s1 0 s1 0 0 s2]) subject fam;random a1 a2 c ~ normal([0,0,0],[s1,0,s1,0,0,s2]) subject=fam;estimate 'icc' s1/(1+s1+s2);run;

Heritabilität


ACE-Modell (SAS-Output)( p )The NLMIXED Procedure

SpecificationsData Set DATA.TWODependent Variable cancerDependent Variable cancer Distribution for Dependent Variable Binary Random Effects a1 a2 c Distribution for Random Effects Normal Subject Variable famOptimization Technique Conjugate-GradientOptimization Technique Conjugate Gradient Integration Method Adaptive Gaussian Quadrature

DimensionsObservations Used 104052Observations Not Used 0Total Observations 104052Subjects 52026Max Obs Per Subject 2Parameters 8Quadrature Points 7

Parameters beta5 beta7 beta8 beta9 beta10 beta0 s1 s2 NegLogLike

-0.036 -0.4501 -0.4601 -0.7199 1.3575 1 1 1 45864.9111

Iteration HistoryNOTE GCONV it i ti fi d


NOTE: GCONV convergence criterion satisfied. NOTE: GCONV convergence criterion satisfied.NOTE: At least one element of the (projected) gradient is greater than 1e-3.

ACE-Modell (SAS-Output)

Fit Statistics

( p )

-2 Log Likelihood 9194.6AIC (smaller is better) 9210.6AICC (smaller is better) 9210.6BIC (smaller is better) 9281.5

Parameter Estimates

StandardParameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient

beta5 -0.01969 0.002813 52E3 -7.00 <.0001 0.05 -0.02520 -0.01417 4.918816beta7 -0.2315 0.05020 52E3 -4.61 <.0001 0.05 -0.3298 -0.1331 2.543996beta8 -0.2310 0.07760 52E3 -2.98 0.0029 0.05 -0.3831 -0.07892 1.825858beta9 -0.3989 0.1196 52E3 -3.34 0.0008 0.05 -0.6333 -0.1646 -0.74822beta10 0.5586 0.08849 52E3 6.31 <.0001 0.05 0.3852 0.7321 -2.78602beta0 3.5842 0.4538 52E3 7.90 <.0001 0.05 2.6947 4.4737 -1.37133s1 0.5071 0.5655 52E3 0.90 0.3699 0.05 -0.6014 1.6155 0.13806s2 0.08351 0.1866 52E3 0.45 0.6546 0.05 -0.2823 0.4493 1.156674

Additional EstimatesAdditional Estimates

StandardLabel Estimate Error DF t Value Pr > |t| Alpha Lower Upper

icc 0.3188 0.2771 52E3 1.15 0.2500 0.05 -0.2244 0.8620


Schlussfolgerungg g

Mit Hilfe eines ACE Modells konnte die genetische• Mit Hilfe eines ACE-Modells konnte die genetische Vererbbarkeit von Brustkrebs geschätzt werden

• 31,88% der Varianz der Anfälligkeit für Brustkrebs ist allein genetischen Faktoren zuzuschreibengenetischen Faktoren zuzuschreiben


Referenzen• Rui Feng, Gongfu Zhou, Meizhuo Zhang, and Heping Zhang.

Analysis of twin data using SAS. Biometrics, 65(2):584–589, y g , ( ) ,Jun 2009.

• Orly Levit, Yuan Jiang, Matthew J Bizzarro, Naveed Hussain, Catalin S Buhimschi, Jeffrey R Gruen, Heping Zhang, and Vineet Bhandari. The genetic susceptibility to respiratory distress syndrome. Pediatr Res, 66(6):693–697, Dec 2009.

• Isabella Locatelli, Paul Lichtenstein, and Anatoli I Yashin. The heritability of breast cancer: a Bayesian correlated frailty modelheritability of breast cancer: a Bayesian correlated frailty model applied to Swedish twins data. Twin Res, 7(2):182–191, Apr 2004.


2004.

genetische anfälligkeit für krebs - ei a l h lbeine ... · genetische anfälligkeit für krebs -...

Documents