logistic multiple 2558u - @@ home - kku web hosting · 2016. 3. 22. · iteration 0: log likelihood...
TRANSCRIPT
1
Multiple Logistic Regression
ผชวยศาสตราจารยนคม ถนอมเสยง
ภาควชาชวสถตและประชากรศาสตร
คณะสาธารณสขศาสตร มหาวทยาลยขอนแกน
0
1
1/2)(
e1
1)f(-
<------- Z ------->
Logistic function
)(e1
1)f(
0
e1
1
1
e1
1
Fitting Multiple Logistic Regression
วเคราะหความสมพนธระหวางตวแปรอสระ 2 ตวแปร
กบตวแปรตาม
ตวแปรตาม (Dependent, Outcome, Response) = discrete
(two possible)
ตวแปรอสระ (independent, predictor, explanatory)
= continuous, categorical (--> dummy)
Outcome Vpredictor
predictor
predictor ...
Multiple Logistic Regression
ตวอยาง การวเคราะหความสมพนธระหวางตวแปรอาย
เชอชาต นาหนกทเพมขน การสบบหร ฯลฯ
กบการเกด low birth weight
LBW0 >=25001 <2500 Age (year)
Race 1=white,2=black,3=other
Lwt = weight mothers at last periodSmk1=yes0=no... FTV = number physician visit
during first Trimester
การวเคราะห Logistic Regression
เขยนความสมพนธแบบ Logit ไดดงน
pp xxxp
pxy
...ˆ1
ˆln)(ˆ 2210
pp
pp
XXXe
XXXe
p
...1
...ˆ
22110
22110
ความนาจะเปนในการเกดเหตการณ
)(ˆ ixp
ตวแปรอสระทอยในโมเดล
โมเดลของ logit กรณมตวแปรแบบ Polychotomous
ใหทาใหเปนตวแปรหน (dummy variable)
pp
k
ljljl xDxy
j
1
10 1
ˆ
ตวอยาง กรณมตวแปรม k ระดบ สรางตวแปรหน
ไดเทากบ k-1 ตวแปร (k=ระดบ, กลม)
ตวแปรหน (dummy variable)
variable D1 D2
code=1 0 0
code=2 1 0
code=3 0 1
(ftv)β)(raceβ)(raceβ(lwt)β(age)ββy othersB 520 431ˆ
เชอชาต D1 D2
ขาว 0 0
ดา 1 0
อนๆ 0 1
ตวอยาง ตวแปรเชอชาต (ขาว, ดา, อนๆ)ใหทาเปน
ตวแปรหน (dummy variables) ดงน
STATA ระบ xi: logit low age lwt i.race ftv
2
การวเคราะห Multiple Logistic Regression ระหวาง Low Birth
Weight และ age, lwt, race, ftv
ftvβIraceβIraceβlwtβageβp
py
5ˆ3__
4ˆ2__
3ˆ
2ˆ
1ˆ
ˆ1
ˆlnˆ 0
. xi: logit low age lwt i.race ftv, nologi.race _Irace_1-3 (naturally coded; _Irace_1 omitted)
Logistic regression Number of obs = 189LR chi2(5) = 12.10Prob > chi2 = 0.0335
Log likelihood = -111.28645 Pseudo R2 = 0.0516
------------------------------------------------------------------------------low | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------age | -.023823 .0337295 -0.71 0.480 -.0899317 .0422857lwt | -.0142446 .0065407 -2.18 0.029 -.0270641 -.0014251
_Irace_2 | 1.003898 .4978579 2.02 0.044 .0281143 1.979681_Irace_3 | .4331084 .3622397 1.20 0.232 -.2768684 1.143085
ftv | -.0493083 .1672386 -0.29 0.768 -.3770899 .2784733_cons | 1.295366 1.071439 1.21 0.227 -.8046157 3.395347
------------------------------------------------------------------------------
. list id low age lwt _Irace_2 _Irace_3 ftv phat
+--------------------------------------------------------------+| id low age lwt _Irace_2 _Irace_3 ftv phat ||--------------------------------------------------------------|
1. | 4 1 28 120 0 1 0 .3434579 |2. | 10 1 29 130 0 0 2 .2065388 |3. | 11 1 34 187 1 0 0 .2360498 |4. | 13 1 25 105 0 1 0 .4102857 |5. | 15 1 25 85 0 1 0 .4805368 |
|--------------------------------------------------------------|...
186. | 223 0 35 170 0 0 1 .1182268 |187. | 224 0 19 120 0 0 0 .2959572 |188. | 225 0 24 116 0 0 1 .2732751 |189. | 226 0 45 123 0 0 1 .1710699 |
+--------------------------------------------------------------+
ftvIraceIracelwtagee
ftvIraceIracelwtagee
p543210
543210
3__2__1
3__2__ˆ
การ Fit Model ในการวเคราะห Logistic Regression
-คานวณคา coefficient ดวยวธ Maximum Likelihood/IRLS
คนควา /ศกษา
Generalized Linear Model:
- Random component or Family: binomial
- Link Function : logit ดงนน
- Systematic component : x1, x
2,… x
p โมเดลเชงเสนเขยนไดเปน
p
p
μ
μg
1ln
1ln)(
pp xxxp
pxy
...ˆ1
ˆln)(ˆ 2210
การทดสอบระดบนยสาคญของ Model
-ใชสถต likelihood ratio test (G ) ระหวางโมเดลทมเฉพาะ
constant กบ fitted Model
-นยสาคญของตวแปรแตละตว ดวย Wald Test
variablethewithlikelihood
variablethewithoutlikelihood2lnG
)(
ˆ
se
Z jj
การทดสอบระดบนยสาคญของ Model
- ใชสถต likelihood ratio test (G ) ระหวางโมเดลทมเฉพาะ
constant กบ fitted Model ดงน
. xi: logit low age lwt i.race ftvi.race _Irace_1-3 (naturally coded; _Irace_1 omitted)
Iteration 0: log likelihood = -117.336Iteration 1: log likelihood = -111.41656Iteration 2: log likelihood = -111.28677Iteration 3: log likelihood = -111.28645
Logit estimates Number of obs = 189LR chi2(5) = 12.10Prob > chi2 = 0.0335
Log likelihood = -111.28645 Pseudo R2 = 0.0516
------------------------------------------------------------------------------low | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------age | -.023823 .0337295 -0.71 0.480 -.0899317 .0422857lwt | -.0142446 .0065407 -2.18 0.029 -.0270641 -.0014251
_Irace_2 | 1.003898 .4978579 2.02 0.044 .0281143 1.979681_Irace_3 | .4331084 .3622397 1.20 0.232 -.2768684 1.143085
ftv | -.0493083 .1672386 -0.29 0.768 -.3770899 .2784733_cons | 1.295366 1.071439 1.21 0.227 -.8046157 3.395347
------------------------------------------------------------------------------
G = -2[(-117.336)-(-111.286))] =12.099
Iteration 0: log likelihood = -117.336Iteration 1: log likelihood = -111.41656Iteration 2: log likelihood = -111.28677Iteration 3: log likelihood = -111.28645
Logit estimates Number of obs = 189LR chi2(5) = 12.10Prob > chi2 = 0.0335
Log likelihood = -111.28645 Pseudo R2 = 0.0516
แสดงวา มตวแปรอยางนอย 1 ตวแปรมคาสมประสทธ
แตกตางจาก 0
3
การทดสอบระดบนยสาคญของ Model
-ใชสถต likelihood ratio test (G ) ระหวางโมเดล เชน
Model 1
Model 2
)()()()(1
ln 43210 othersB raceracelwtagep
p
. use "H:\Hosmer_logistic\alr_data_Hosmer\logistic\lwt_2556.dta", clear
. xi: logit low age lwt i.racei.race _Irace_1-3 (naturally coded; _Irace_1 omitted)Iteration 0: log likelihood = -117.336…Iteration 3: log likelihood = -111.33032Logistic regression Number of obs = 189
LR chi2(4) = 12.01Prob > chi2 = 0.0173
Log likelihood = -111.33032 Pseudo R2 = 0.0512------------------------------------------------------------------------------
low | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
age | -.0255238 .033252 -0.77 0.443 -.0906966 .039649lwt | -.0143532 .0065228 -2.20 0.028 -.0271377 -.0015688
_Irace_2 | 1.003822 .4980135 2.02 0.044 .0277335 1.97991_Irace_3 | .4434608 .3602569 1.23 0.218 -.2626298 1.149551
_cons | 1.306741 1.069782 1.22 0.222 -.7899926 3.403475------------------------------------------------------------------------------. est store m1
)()()()()(1
ln 543210 ftvraceracelwtagep
pothersB
. xi: logit low age lwt i.race ftvi.race _Irace_1-3 (naturally coded; _Irace_1 omitted)Iteration 0: log likelihood = -117.336Iteration 1: log likelihood = -111.41656Iteration 2: log likelihood = -111.28677Iteration 3: log likelihood = -111.28645Logistic regression Number of obs = 189
LR chi2(5) = 12.10Prob > chi2 = 0.0335
Log likelihood = -111.28645 Pseudo R2 = 0.0516------------------------------------------------------------------------------
low | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
age | -.023823 .0337295 -0.71 0.480 -.0899317 .0422857lwt | -.0142446 .0065407 -2.18 0.029 -.0270641 -.0014251
_Irace_2 | 1.003898 .4978579 2.02 0.044 .0281143 1.979681_Irace_3 | .4331084 .3622397 1.20 0.232 -.2768684 1.143085
ftv | -.0493083 .1672386 -0.29 0.768 -.3770899 .2784733_cons | 1.295366 1.071439 1.21 0.227 -.8046157 3.395347
------------------------------------------------------------------------------. est store m2. lrtest m1 m2Likelihood-ratio test LR chi2(1) = 0.09(Assumption: m1 nested in m2) Prob > chi2 = 0.7671
. di -2*((-111.33032)-(-111.28645))
.08774
. di chiprob(1,.08774)
.76707018
G = -2ln(likelihood without the variable-likelihood with the variable)
การมนยสาคญของตวแปรแตละตวดวย Wald Test
)(
ˆ
seZ jj
------------------------------------------------------------------------------low | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------age | -.023823 .0337295 -0.71 0.480 -.0899317 .0422857lwt | -.0142446 .0065407 -2.18 0.029 -.0270641 -.0014251
_Irace_2 | 1.003898 .4978579 2.02 0.044 .0281143 1.979681_Irace_3 | .4331084 .3622397 1.20 0.232 -.2768684 1.143085
ftv | -.0493083 .1672386 -0.29 0.768 -.3770899 .2784733_cons | 1.295366 1.071439 1.21 0.227 -.8046157 3.395347
------------------------------------------------------------------------------
Confidence Interval Estimation
-Estimate confidence of coefficient
)ˆ(ˆ)1(100 2/ seZof%CI i
xi: logit low lwt i.racei.race _Irace_1-3 (naturally coded; _Irace_1 omitted)…
Logistic regression Number of obs = 189LR chi2(3) = 11.41Prob > chi2 = 0.0097
Log likelihood = -111.62955 Pseudo R2 = 0.0486
------------------------------------------------------------------------------low | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------lwt | -.0152231 .0064393 -2.36 0.018 -.0278439 -.0026023
_Irace_2 | 1.081066 .4880512 2.22 0.027 .1245034 2.037629_Irace_3 | .4806033 .3566733 1.35 0.178 -.2184636 1.17967
_cons | .8057535 .8451625 0.95 0.340 -.8507345 2.462241------------------------------------------------------------------------------
p
ijiji
p
i
p
iii
p
iii voCxxraVxxraV
0 11
2
0
)ˆ,ˆ(ˆ2)ˆ(ˆ)ˆ(ˆ
การประมาณคาความนาจะเปนรายขอมลและชวงเชอมน
Individual Predicted probability & Confidence Interval
Estimation
-Estimate Variance of logit
)ˆ()(ˆ)(5)1(1000
2/
p
iiii xseZxpxpof%CI
ตวอยาง การคานวณความแปรปรวน เมอ lwt=150 race=White
)]ˆ,ˆ()][()][([2
)]ˆ,ˆ()][()[(2
)]ˆ,ˆ()][()[(2)]ˆ,ˆ()][([2
)]ˆ,ˆ()][([2)]ˆ,ˆ()[(2
)]ˆ(][)([)]ˆ(][)([
)]ˆ()[()ˆ(ˆ)],150(ˆ[ˆ
32
31
2130
2010
32
22
12
0
Covblackraceblackrace
Covotherracelwt
CovblackracelwtCovotherrace
CovblackraceCovlwt
VarotherraceVarblackrace
VarlwtraVwhiteracelwtyraV
p
ijiji
p
i
p
iii
p
iii voCxxraVxxraV
0 11
2
0
)ˆ,ˆ(ˆ2)ˆ(ˆ)ˆ(ˆ
. di .71429959 + (150^2)*(.00004146) + (0^2)*(.23819397) + (0^2)*(.12721584) + 2*150*(-.00521365) + 2*0*(.02260223) + 2*0*( -.1034968) + 2*0*(-.00064703) + 2*0*(.00035585) + 2*0*0*(.05320001)
.08305459
4
xi: logit low lwt i.race, nologi.race _Irace_1-3 (naturally coded; _Irace_1 omitted)
Logistic regression Number of obs = 189LR chi2(3) = 11.41Prob > chi2 = 0.0097
Log likelihood = -111.62955 Pseudo R2 = 0.0486
------------------------------------------------------------------------------low | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------lwt | -.0152231 .0064393 -2.36 0.018 -.0278439 -.0026023
_Irace_2 | 1.081066 .4880512 2.22 0.027 .1245034 2.037629_Irace_3 | .4806033 .3566733 1.35 0.178 -.2184636 1.17967
_cons | .8057535 .8451625 0.95 0.340 -.8507345 2.462241------------------------------------------------------------------------------
. vce
Covariance matrix of coefficients of logit model
e(V) | lwt _Irace_2 _Irace_3 _cons -------------+------------------------------------------------
lwt | .00004146_Irace_2 | -.00064703 .23819397_Irace_3 | .00035585 .05320001 .12721584
_cons | -.00521365 .02260223 -.1034968 .71429959
. di (-.0152231*150)+(1.081066*0)+(.4806033*0) + .8057535-1.4777115
. di exp(-1.4777115)/(1+exp(-1.4777115))
.18577333
. prvalue, x(lwt=150 _Irace_2=0 _Irace_3=0)logit: Predictions for lowConfidence intervals by delta method
95% Conf. IntervalPr(y=1|x): 0.1858 [ 0.1003, 0.2713]Pr(y=0|x): 0.8142 [ 0.7287, 0.8997]
lwt _Irace_2 _Irace_3x= 150 0 0
pp
pp
XXXαe
XXXαe
p
...1
...ˆ
2211
2211
ftvIraceIracelwtageαe
ftvIraceIracelwtageαe
p54321
54321
3__2__1
3__2__ˆ
ความนาจะเปนในการเกดทารกนาหนกนอยกวากาหนดเมอ lwt=150, ผวขาว
Confidence Interval Estimation
-Estimate confidence of p
)pse(ZpittrueCI α/i ˆˆlog)%1(100 2
)ˆ(ˆ 2/
)ˆ(2/ˆ
1)%1(100 pseZp
e
e
epofCI
pseZp
. do "I:\cat2011\95ci_p_logit.do"
. di (exp(-1.4777115-((abs(invnormal(0.025)))*sqrt(.08305459))))/(1+(exp(-1.4777115-((abs(invnormal(0.025)))*sqrt(.08305459)))))
.11480659
. di (exp(-1.4777115+((abs(invnormal(0.025)))*sqrt(.08305459))))/(1+(exp(-1.4777115+((abs(invnormal(0.025)))*sqrt(.08305459)))))
.28641379
Interpretation of the fitted model: odds ratio
- ตวแปร Dichotomous - ม 2 ระดบหรอ 2 กลม
Two independent variablesx1 code 0,1 ,and Fixed Value of x2; or Adjusted x2
22
22
22
221
221
221
221
221
221
221
1
1],0|0Pr[1],0|0[
,11
],0|1[
,1
1],1|1Pr[1],1|0[
,11
],1|1[
2121
)0(
)0(
21
)1(2121
)1(
)1(
21
x
x
x
x
x
x
x
x
x
x
exxyxxyP
e
e
e
exxyP
exxyxxyP
e
e
e
exxyP
1221221
22221
22
221
ee
eee
e
bc
ador
xx
xxx
x
dxxyxxyP
cxxyP
bxxyxxyP
axxyP
],0|0Pr[1],0|0[
,],0|1[
,],1|1Pr[1],1|0[
,],1|1[
2121
21
2121
21
221
221
1 x
x
e
e
221 )1(1
1xe
22
22
1 x
x
e
e
221
1xe
a bc d
a
b
c
d
ตวอยาง ในการวเคราะห multiple logistic regressionsmoke, age ตองการแปลผล odds ratio ตวแปร smoke โดย Adjusted age
age
age
age
age
age
age
age
age
age
age
e
agesmokelowagesmokelowPe
e
e
eagesmokelowP
exxyagesmokelowP
e
e
e
eagesmokelowP
2
2
2
21
21
21
21
21
21
21
1
1
],0|0Pr[1],0|0[11
],0|1[
1
1],1|1Pr[1],1|0[
11],1|1[
)0(
)0(
)1(21
)1(
)1(
5
12121
221
2
21
ee
eee
e
bc
ador
ageage
ageageage
age
dagesmokelowagesmokelowP
cagesmokelowP
bagesmokelowagesmokelowP
aagesmokelowP
],0|0Pr[1],0|0[
,],0|1[
,],1|1Pr[1],1|0[
,],1|1[
age
age
e
e21
21
1
agee 21 )1(1
1
age
age
e
e2
2
1
agee 21
1
a bc d
ดงนนการคานวณ odds ratio ในสมการ logistic regression
-เรยกวา Adjusted odds ratio
ตวอยาง เมอให smoke=1 เปนตวแปรทตองการศกษา
- ตวแปร age เปนตวแปรควบคม- ตวแปร age มคาเทากน ในแตละกลมทศกษา
ORadjustediβe
iOR
การคานวณ odds ratio จากสมการ logistic regression
-วดระดบความสมพนธ
-คาทได เปนคาทควบคมผลจากตวแปรทกตวเรยกวา
Adjusted odds ratio
ตวแปรตาม DExposure (E)
Control (C)
Control (C) Control...
ความหมาย odds ratio จากสมการ logistic regression
-เมอควบคมผลจากปจจย Ci การสมผสปจจย E มความเสยง
ตอการเกด D เปน OR เทาของการไมไดสมผสปจจย E
. logit low smoke age, or
Iteration 0: log likelihood = -117.336Iteration 1: log likelihood = -113.66733Iteration 2: log likelihood = -113.63815Iteration 3: log likelihood = -113.63815
Logistic regression Number of obs = 189LR chi2(2) = 7.40Prob > chi2 = 0.0248
Log likelihood = -113.63815 Pseudo R2 = 0.0315
------------------------------------------------------------------------------low | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------smoke | 1.997405 .642777 2.15 0.032 1.063027 3.753081
age | .9514394 .0304194 -1.56 0.119 .8936481 1.012968------------------------------------------------------------------------------
ความหมาย odds ratio จากสมการ logistic regression
-เมอควบคมอาย การสบบหร มความเสยง ตอการเกด
ทารกนาหนกตวนอย เปน 1.997 เทาของการไมสบบหร
ตวแปร Polychotomous
-ตวแปรอสระทมระดบหรอจานวนกลม > 2 กลม
-สรางตวแปรหน (dummy variables) k-1 ตวแปร
ตวอยาง กรณมตวแปรม k ระดบ สรางตวแปรหน
ไดเทากบ k-1 ตวแปร (k=ระดบ, กลม)
level/ ตวแปรหน (dummy variable)
group code D1 D2
code=1 0 0
code=2 1 0
code=3 0 1
Reference Cell
การเปรยบเทยบ code=2 VS code=1, code=3 VS code=1
three independent variables- x1 code 0,1 ,2 - and Fixed Value of x2 ,x3; or Adjusted x2,x3
,1
1
],,1|1[1],,1|0[
,1
1],,1|1[
33221
33221
33221
33221
33221
)1(
321321
)1(
)1(
321
xx
xx
xx
xx
xx
e
xxxyPxxxyPe
e
e
exxxyP
= a
= b
6
3322
3322
3322
33221
33221
1
1
],,0|0[1],,0|0[
,1
1],,0|1[
321321
)0(
)0(
321
xx
xx
xx
xx
xx
e
xxxyPxxxyPe
e
e
exxxyP
= c
= d
33221
33221
1 xx
xx
e
e
33221 )1(1
1xxe
3322
3322
1 xx
xx
e
e
33221
1xxe
a bc d
13322133221
332233221
22
33221
ee
eee
e
bc
ador
xxxx
xxxxx
xx
,1
1
],3__,,,1_|1[
54321
54321
54321
54321
3__
3__
3__)1(
3__)1(
ftvIracelwtage
ftvIracelwtage
ftvIracelwtage
ftvIracelwtage
e
e
e
e
ftvIracelwtageIraceyP
= a
ตวอยาง ในการวเคราะห multiple logistic regressionage, lwt, i.rece (_Irace_2) , ftv ; ตองการแปลผล odds ratio _Irace_2 แสดงวา Adjusted age, lwt, i.rece (_Irace_3) , ftv ,
1
1
],3__,,,12__|1[1
],3__,,,12__|0[
54321 3__ ftvIracelwtagee
ftvIracelwtageIraceyP
ftvIracelwtagelraceyP
,1
1
],3__,,,0_|1[
5432
5432
54321
54321
3__
3__
3__)0(
3__)0(
ftvIracelwtage
ftvIracelwtage
ftvIracelwtage
ftvIracelwtage
e
e
e
e
ftvIracelwtageIraceyP
= b
= c
,1
1
],3__,,,12__|1[1
],3__,,,02__|0[
5432 3__ ftvIracelwtagee
ftvIracelwtageIraceyP
ftvIracelwtagelraceyP
= d
ftvIracelwtage
ftvIracelwtage
e
e54321
54321
3__
3__
1
ftvIracelwtagee 54321 3__1
1
ftvIracelwtage
ftvIracelwtage
e
e5432
5432
3__
3__
1
ftvIracelwtagee 5432 3__1
1
a bc d
1
5432
54321
3__
3__
e
e
e
bc
ador
ftvIracelwtage
ftvIracelwtage
7
ดงนนการคานวณ odds ratio ในสมการ logistic regression
-เรยกวา Adjusted odds ratio
ตวอยาง เมอให _Irace_2 (ผวดา) เปนตวแปรทตองการศกษา
- ตวแปร AGE, lwt, _Irace_3 (ผวอนๆ) ตวแปร age, lwt,
_Irace_3 (ผวอนๆ), ftv เปนตวแปรควบคม
- ตวแปร age, lwt, _Irace_3 (ผวอนๆ), ftv มคาเทากน
ในแตละกลมทศกษา
ORadjustediβe
iOR
. xi: logit low age lwt i.race ftv,ori.race _Irace_1-3 (naturally coded; _Irace_1 omitted)
Iteration 0: log likelihood = -117.336Iteration 1: log likelihood = -111.41656Iteration 2: log likelihood = -111.28677Iteration 3: log likelihood = -111.28645
Logit estimates Number of obs = 189LR chi2(5) = 12.10Prob > chi2 = 0.0335
Log likelihood = -111.28645 Pseudo R2 = 0.0516
------------------------------------------------------------------------------low | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------age | .9764586 .0329355 -0.71 0.480 .9139936 1.043193lwt | .9858564 .0064482 -2.18 0.029 .9732989 .9985759
_Irace_2 | 2.728898 1.358603 2.02 0.044 1.028513 7.240436_Irace_3 | 1.542043 .5585894 1.20 0.232 .7581543 3.13643
ftv | .9518876 .1591923 -0.29 0.768 .6858544 1.321111------------------------------------------------------------------------------
การแปลความหมาย odds ratio: กรณตวแปรตอเนอง
-การเปลยนแปลง 1 หนวย is not clinically interesting
เชนอายเพมขน 1 ป หรอความดนโลหตเพมขน 1 mm.Hg
-การเปลยนแปลงควรเปน 5 , 10,…
-หรอตรงกนขาม x มคา 0-1 หนวย การเปลยนแปลง 1 หนวย
เปนคามากไป การเพม 0.01 อาจมความเหมาะสมกวา
-วธการคานวณ odds ratio กรณตวแปรตอเนองดงน
1ˆ
)(βc
ecOR )]ˆ([
1ˆ[
)(%952/ secZβc
ecORofCI
การแปลความหมาย odds ratio: Change in Odds or Percent
. listcoeflogit (N=189): Factor Change in Odds Odds of: 1 vs 0
----------------------------------------------------------------------low | b z P>|z| e^b e^bStdX SDofX
-------------+--------------------------------------------------------age | -0.02382 -0.706 0.480 0.9765 0.8814 5.2987lwt | -0.01424 -2.178 0.029 0.9859 0.6469 30.5794
_Irace_2 | 1.00390 2.016 0.044 2.7289 1.4144 0.3454_Irace_3 | 0.43311 1.196 0.232 1.5420 1.2309 0.4796
ftv | -0.04931 -0.295 0.768 0.9519 0.9491 1.0593----------------------------------------------------------------------
. listcoef, percentlogit (N=189): Percentage Change in Odds Odds of: 1 vs 0
----------------------------------------------------------------------low | b z P>|z| % %StdX SDofX
-------------+--------------------------------------------------------age | -0.02382 -0.706 0.480 -2.4 -11.9 5.2987lwt | -0.01424 -2.178 0.029 -1.4 -35.3 30.5794
_Irace_2 | 1.00390 2.016 0.044 172.9 41.4 0.3454_Irace_3 | 0.43311 1.196 0.232 54.2 23.1 0.4796
ftv | -0.04931 -0.295 0.768 -4.8 -5.1 1.0593----------------------------------------------------------------------
ตวกวนและอตรกรยา (Confounding & Interaction)
interaction อนตรกรยา (ศพยคณตศาสตร, ราชบณฑตยสถาน, 2547)
การปฏสมพนธ (สวทช.)
-การพจารณาตวกวน เรยกวา “delta-beta-hat-percent” ( )
suggest >20%.
ตวอยาง การพจารณาตวกวนและอตรกรยา ใน 3 ลกษณะ
-No Statistical adjustment or interaction
-Statistical adjustment but no statistical interaction
-Statistical adjustment & interaction
100ˆ
)ˆˆ(%ˆ x
Full
Fullreduce
%
is the coefficient from the smaller model
is the coefficient from the larger model.full
reducei
ˆ
ˆ
. logit FRACTURE PRIORFRAC…------------------------------------------------------------------------------
FRACTURE | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
PRIORFRAC | 1.06383 .2230811 4.77 0.000 .6265986 1.50106_cons | -1.416651 .1304641 -10.86 0.000 -1.672356 -1.160946
------------------------------------------------------------------------------. mat b1=e(b)
. logit FRACTURE PRIORFRAC HEIGHT
...------------------------------------------------------------------------------
FRACTURE | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
PRIORFRAC | 1.013009 .2253996 4.49 0.000 .571234 1.454784HEIGHT | -.0453531 .0173787 -2.61 0.009 -.0794146 -.0112915_cons | 5.894665 2.795904 2.11 0.035 .4147926 11.37454
------------------------------------------------------------------------------. mat b2=e(b). di ((b1[1,1]-b2[1,1])/b1[1,1])*1004.7771187
. gen PH=PRIORFRAC*HEIGHT
. logit FRACTURE PRIORFRAC HEIGHT PH
...------------------------------------------------------------------------------
FRACTURE | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
PRIORFRAC | -3.055134 5.790416 -0.53 0.598 -14.40414 8.293873HEIGHT | -.0544845 .0218529 -2.49 0.013 -.0973153 -.0116537
PH | .0253921 .0361138 0.70 0.482 -.0453896 .0961739_cons | 7.361277 3.510281 2.10 0.036 .4812532 14.2413
------------------------------------------------------------------------------
100ˆ
)ˆˆ(%ˆ x
Full
Fullreduce
No Statistical adjustment or interaction
8
Statistical adjustment but no statistical interaction. logit MYOPIC GENDER ...------------------------------------------------------------------------------
MYOPIC | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
GENDER | .3664706 .2403654 1.52 0.127 -.104637 .8375781_cons | -2.083007 .1792488 -11.62 0.000 -2.434328 -1.731685
------------------------------------------------------------------------------. mat b1=e(b)
. logit MYOPIC GENDER SPHEQ …
------------------------------------------------------------------------------MYOPIC | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------GENDER | .5580827 .2850533 1.96 0.050 -.0006116 1.116777SPHEQ | -3.844609 .4171492 -9.22 0.000 -4.662206 -3.027011_cons | -.2260938 .2527147 -0.89 0.371 -.7214055 .269218
------------------------------------------------------------------------------. mat b3=e(b). di ((b1[1,1]-b3[1,1])/b1[1,1])*100-52.285816
. gen gs=GENDER*SPHEQ
. logit MYOPIC GENDER SPHEQ gs…------------------------------------------------------------------------------
MYOPIC | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
GENDER | .4915759 .4156574 1.18 0.237 -.3230975 1.306249SPHEQ | -3.948287 .6353135 -6.21 0.000 -5.193479 -2.703096
gs | .1850561 .8421906 0.22 0.826 -1.465607 1.835719_cons | -.1910971 .2999269 -0.64 0.524 -.778943 .3967487
------------------------------------------------------------------------------
100ˆ
)ˆˆ(%ˆ x
Full
Fullreduce
Statistical adjustment & interaction. logit FRACTURE PRIORFRAC...------------------------------------------------------------------------------
FRACTURE | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
PRIORFRAC | 1.06383 .2230811 4.77 0.000 .6265986 1.50106_cons | -1.416651 .1304641 -10.86 0.000 -1.672356 -1.160946
------------------------------------------------------------------------------. mat b1=e(b)
. logit FRACTURE PRIORFRAC AGE…------------------------------------------------------------------------------
FRACTURE | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
PRIORFRAC | .838835 .2341556 3.58 0.000 .3798985 1.297772AGE | .0411928 .0121788 3.38 0.001 .0173228 .0650629
_cons | -4.214295 .8478396 -4.97 0.000 -5.87603 -2.55256------------------------------------------------------------------------------. mat b3=e(b). di ((b1[1,1]-b3[1,1])/b1[1,1])*10021.149487
. gen PA=PRIORFRAC*AGE
. logit FRACTURE PRIORFRAC HEIGHT PA…------------------------------------------------------------------------------
FRACTURE | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
PRIORFRAC | 4.961339 1.81022 2.74 0.006 1.413372 8.509305AGE | .0625149 .0154607 4.04 0.000 .0322124 .0928173PA | -.057382 .0250141 -2.29 0.022 -.1064087 -.0083553
_cons | -5.689421 1.08408 -5.25 0.000 -7.814179 -3.564662------------------------------------------------------------------------------
100ˆ
)ˆˆ(%ˆ x
Full
Fullreduce
วธการคานวณทาไดโดย
1) การสรางโมเดลทประกอบดวยตวแปร interaction แยกตามกลมตวแปรเสยง
2) คานวณความแตกตางระหวาสองโมเดล
3) ให exponential คาทไดในขอ 2 ดงน
ถาให f เปนตวแปรเสยง และ x เปนตวแปร covariate
เขยนสมการแบบ logit ดงน
xfxfxfg ii 3210),(
การคานวณ odds ratio กรณม interactionการคานวณ odds ratio กรณม interaction
เมอตวแปร independent = 2 ตวแปร
)()(exp 013011 ffxffor
fxxfp
pyxfg 3211
ln),(
F=f =1= risk factor f=0 reference , x = covariate
Confidence interval
xXFForESZx ,0,1(lnˆˆˆ2/31
)ˆ,ˆ(ˆ2)ˆ(ˆ)ˆ(ˆ
,0,1(ˆlnˆ
2132
1 voxCraVxraV
xXFFROraV
1. สรางโมเดล logit 2 โมเดล
2. คานวณความแตกตางของ 2 โมเดล
3.ให exponential คาทไดในขอ 2
))(0(ˆ)(ˆ)0(ˆˆ),0(
),(
))(1(ˆ)(ˆ)1(ˆˆ),1(
),(
3210
0320100
3210
1321101
ageageageagepriorfracg
xfxfxfg
ageageageagepriorfracg
xfxfxfg
))(ˆˆexp( 31 ageor
)(ˆˆ
)](ˆˆ[)])(1(ˆ)(ˆ)1(ˆˆ[(
)],0(),1([)],(),([
31
203210
01
age
ageageage
ageagepriorfracageagepriorfracgxfgxfg
logit fracture priorfrac age pa, noheader nolog------------------------------------------------------------------------------
fracture | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
priorfrac | 4.961339 1.81022 2.74 0.006 1.413372 8.509305age | .0625149 .0154607 4.04 0.000 .0322124 .0928173pa | -.057382 .0250141 -2.29 0.022 -.1064087 -.0083553
_cons | -5.689421 1.08408 -5.25 0.000 -7.814179 -3.564662------------------------------------------------------------------------------
. vceCovariance matrix of coefficients of logit model
| fracture e(V) | priorfrac age pa _cons
-------------+------------------------------------------------fracture |
priorfrac | 3.2768972age | .01663278 .00023903pa | -.04491676 -.00023903 .0006257
_cons | -1.1752302 -.01663278 .01663278 1.1752302
. di exp(4.961339 + (-.057382*(68.562)))2.7929945
. lincom priorfrac + pa*(68.562) , or( 1) [fracture]priorfrac + 68.562*[fracture]pa = 0------------------------------------------------------------------------------
fracture | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
(1) | 2.792988 .6784603 4.23 0.000 1.734998 4.496133------------------------------------------------------------------------------
Ex. Priorfrac (1=yes, 0=no; age = continuous (years)
9
Modeling Strategy
Two goals of mathematical modeling
(1) To obtain a valid estimate of an explanatory variables
and response variable relationship
(2) To obtain a good predictive model
Different strategies for difference goals
-Prediction goal -> use computer algorithms
forward selection, backward elimination, stepwise, all possible
-Validity goals -> for etiologic research, standard computer
algorithms do not appropriate because the roles that
variables - such as confounder & effect modifiers (interaction)
Modeling Building Strategies Guidelines & Method for
Logistic Regression
Variable Selection “Most parsimonious model”
-minimizing the number of variables in the model
-Model is more likely to be numerically stable
-More easily generalized
การเลอกแบบเจาะจง (purposeful selection)
ขนตอนสาหรบสาหรบการวเคราะห logistic regression model
(Hosmer & Lameshow, 2000; 2013; Bursac, 2008)
Step 1:-A univariable analysis of each independent variable.
-A careful univariable analysis of each variable
-Any variable whose univariable test has a p-value of
less than 0.25 should be included in the first
multivariable model.
Step 2: -Fit a multivariable model containing all covariates
identified for inclusion at step 1 and
-to assess the importance of each covariate using the
p-value of its Wald statistic.
-Variables that do not contribute at traditional levels of
significance should be eliminated & a new model fit.
-The newer, smaller model should be compared to the
old, larger model using the partial likelihood ratio test.
Step 3: Compare the values of the estimated coefficients in the
smaller model to their respective values from the large
model.
-Any variable whose coefficient has changed markedly in magnitude
should be added back into the model as it is important in the sense of
providing a needed adjustment of the effect of the variables that remain
in the model.
-Cycle through steps 2 and 3 until it appears that all of the important
variables are included in the model and those excluded are clinically
and/or statistically unimportant.
- Hosmer et al. use the "delta-beta-hat-percent" as a measure of the
change in magnitude of the coefficients. They suggest a significant
change as > 20%.
is the coefficient from the smaller model and
is the coefficient from the larger model.
100ˆ
)ˆˆ(%ˆ x
Full
Fullreduce
full
reducei
ˆ
ˆ
%
Step 4: Add each variable not selected in Step 1 to the model
obtained at the end of step 3, one at a time, and check its
significance either by the Wald statistic p-value or the
partial likelihood ratio test
if it is a categorical variable with more than 2
levels. This step is vital for identifying variables that, by
themselves, are not significantly related to the outcome but
make an important contribution in the presence of other
variables. Refer to the model at the end of Step 4 as the
preliminary main effects model.
10
Step 5: Once we have obtained a model that we feel contains the
essential variables, we examine more closely the variables
in the model. Rrefer to the model at the end of Step 5 as
the main effects model
Step 6: Once we have the main effects model, we check for
interactions among the variables in the model.
Only consider the statistical significance of interactions
and as such, they must contribute to the model at
traditional levels, such as 5% or even 1%.
Step 7: Before any model becomes the final model we must
assess its adequacy and check its fit.
Fit univariate with each variable
Create data set with covariate where p < p-value input
Fit Multivariate model each variable
Identity max p-value is max p <& p-value
Remove variable Test association with each variable
Originally not select
(include preliminary model variable)
no yes
Identity max
Is > change Beta
Reduce the model using A and Byes
Keep variable
Evaluate next variable
no
Delete variableFinal Main Effect Model
การเลอกแบบเจาะจง (purposeful selection)
Bursac, et al. (2008)
The modeling strategy involves three stages:
(Kleinbaum & Klein, 2002)
(1) variable specification,
(2) interaction assessment, and
(3) confounding assessment followed by consideration
of precision.
ตวอยาง การวเคราะหขอมล University of Massachusetts Aids
Research Unit (UMARU) Impact Study (UIS)
id Id number age Age at Enrollment beck Beck Depression Score ivhx IV Drug Use History (1=never
2=previous 3=recent) ndrugtx Number of Prior Drug Txrace Subject’s Race
(0=white 1=other) treat Tx Randomization
(0=short 1=long) site Tx Site (0=A,1=B) dfree Returned to Drug Use
(1=remained 0=otherwise)
Step 1: -A univariable analysis of each independent variable .
-Any variable whose univariable test has a p-value of
less than 0.25 should be included in the first
multivariable model.
- A careful univariable analysis of each variable
- Univariable logistric regression (y=0,1) กบตวแปรอสระ
ทกตวแปร
- ตวแปร nominal , ordinal Scale ทาใหเปนตวแปรหน วเคราะห
ดวย univariable logistic regression พจารณาคาสถต Wald test,
likelihood ratio หรอวเคราะหตารางการณจรดวยสถต likelihood
ratio Chi-Square, Pearson Chi-Square
- ตวแปร continuous วเคราะหดวย univariable logistic regression
พจารณาคาสถต Wald test, likelihood ratio test
- Univariable analysis ( crude analysis) พบวา p-value <.25
(Hosmer & Lemeshow 2000: p.118; p-value 0.15-0.20)
- ตวแปรทนามาสรางในโมเดล มความสาคญ (clinically biological
meaningful) /มเหตผล
11
. logit dfree
Iteration 0: log likelihood = -326.86446Logistic regression Number of obs = 575
LR chi2(0) = -0.00Prob > chi2 = .
Log likelihood = -326.86446 Pseudo R2 = -0.0000------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
_cons | -1.068691 .095599 -11.18 0.000 -1.256061 -.88132------------------------------------------------------------------------------
. estimates store A
. logit dfree age Iteration 0: log likelihood = -326.86446Iteration 1: log likelihood = -326.16602Iteration 2: log likelihood = -326.16544Logistic regression Number of obs = 575
LR chi2(1) = 1.40Prob > chi2 = 0.2371
Log likelihood = -326.16544 Pseudo R2 = 0.0021------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
age | .0181723 .015344 1.18 0.236 -.0119014 .048246_cons | -1.660226 .5110844 -3.25 0.001 -2.661933 -.6585194
------------------------------------------------------------------------------
. logit dfree age, or
...Logistic regression Number of obs = 575
LR chi2(1) = 1.40Prob > chi2 = 0.2371
Log likelihood = -326.16544 Pseudo R2 = 0.0021------------------------------------------------------------------------------
dfree | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
age | 1.018338 .0156254 1.18 0.236 .9881691 1.049429------------------------------------------------------------------------------
. estimates store B
. lrtest A B Likelihood-ratio test LR chi2(1) = 1.40(Assumption: A nested in B) Prob > chi2 = 0.2371
. lincom 10*age,or
( 1) 10 age = 0------------------------------------------------------------------------------
dfree | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
(1) | 1.199282 .184018 1.18 0.236 .887795 1.620055------------------------------------------------------------------------------
*** odds ratio for a 10 point increase in BECK
. logit dfree beck…Logistic regression Number of obs = 575
LR chi2(1) = 0.64Prob > chi2 = 0.4250
Log likelihood = -326.54621 Pseudo R2 = 0.0010------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
beck | -.008225 .0103428 -0.80 0.426 -.0284965 .0120464_cons | -.9272829 .2003166 -4.63 0.000 -1.319896 -.5346696
------------------------------------------------------------------------------
. estimates store C
. lrtest A C
Likelihood-ratio test LR chi2(1) = 0.64(Assumption: A nested in C) Prob > chi2 = 0.4250
. logit dfree beck, or…Logistic regression Number of obs = 575
LR chi2(1) = 0.64Prob > chi2 = 0.4250
Log likelihood = -326.54621 Pseudo R2 = 0.0010------------------------------------------------------------------------------
dfree | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
beck | .9918087 .010258 -0.80 0.426 .9719057 1.012119------------------------------------------------------------------------------
. lincom 5*beck,or
( 1) 5 beck = 0
------------------------------------------------------------------------------dfree | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------(1) | .959709 .0496302 -0.80 0.426 .8672027 1.062083
------------------------------------------------------------------------------5555
*** odds ratio for a 5 point increase in BECK
. logit dfree ndrugtx…Logistic regression Number of obs = 575
LR chi2(1) = 11.84Prob > chi2 = 0.0006
Log likelihood = -320.94485 Pseudo R2 = 0.0181------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
ndrugtx | -.0749582 .024681 -3.04 0.002 -.123332 -.0265844_cons | -.7677805 .130326 -5.89 0.000 -1.023215 -.5123462
------------------------------------------------------------------------------
. logit dfree ndrugtx, or
...Logistic regression Number of obs = 575
LR chi2(1) = 11.84Prob > chi2 = 0.0006
Log likelihood = -320.94485 Pseudo R2 = 0.0181------------------------------------------------------------------------------
dfree | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
ndrugtx | .9277822 .0228986 -3.04 0.002 .8839701 .9737658------------------------------------------------------------------------------. estimates store D. lrtest A DLikelihood-ratio test LR chi2(1) = 11.84(Assumption: A nested in D) Prob > chi2 = 0.0006
. xi:logit dfree i.ivhxi.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)…Logistic regression Number of obs = 575
LR chi2(2) = 13.35Prob > chi2 = 0.0013
Log likelihood = -320.18821 Pseudo R2 = 0.0204------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
_Iivhx_2 | -.4810199 .2657063 -1.81 0.070 -1.001795 .0397548_Iivhx_3 | -.7748382 .2165765 -3.58 0.000 -1.19932 -.3503561
_cons | -.6797242 .1417395 -4.80 0.000 -.9575285 -.4019198------------------------------------------------------------------------------
. xi:logit dfree i.ivhx, ori.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)...Logistic regression Number of obs = 575
LR chi2(2) = 13.35Prob > chi2 = 0.0013
Log likelihood = -320.18821 Pseudo R2 = 0.0204------------------------------------------------------------------------------
dfree | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
_Iivhx_2 | .6181526 .164247 -1.81 0.070 .3672198 1.040556_Iivhx_3 | .4607783 .0997937 -3.58 0.000 .301399 .7044372
------------------------------------------------------------------------------. estimates store E. lrtest A ELikelihood-ratio test LR chi2(2) = 13.35(Assumption: A nested in E) Prob > chi2 = 0.0013
12
. logit dfree race
...Logistic regression Number of obs = 575
LR chi2(1) = 4.62Prob > chi2 = 0.0315
Log likelihood = -324.55269 Pseudo R2 = 0.0071------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
race | .4591026 .2109763 2.18 0.030 .0455967 .8726085_cons | -1.193922 .1141504 -10.46 0.000 -1.417653 -.9701919
------------------------------------------------------------------------------
. logit dfree race, or
...Logistic regression Number of obs = 575
LR chi2(1) = 4.62Prob > chi2 = 0.0315
Log likelihood = -324.55269 Pseudo R2 = 0.0071------------------------------------------------------------------------------
dfree | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
race | 1.582653 .3339022 2.18 0.030 1.046652 2.393145------------------------------------------------------------------------------
. estimates store F
. lrtest A F
Likelihood-ratio test LR chi2(1) = 4.62(Assumption: A nested in F) Prob > chi2 = 0.0315
. logit dfree treat
...Logistic regression Number of obs = 575
LR chi2(1) = 5.18Prob > chi2 = 0.0229
Log likelihood = -324.27534 Pseudo R2 = 0.0079------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
treat | .437162 .1930633 2.26 0.024 .0587649 .8155591_cons | -1.297816 .143296 -9.06 0.000 -1.578671 -1.016961
------------------------------------------------------------------------------
. logit dfree treat, or
...Logistic regression Number of obs = 575
LR chi2(1) = 5.18Prob > chi2 = 0.0229
Log likelihood = -324.27534 Pseudo R2 = 0.0079------------------------------------------------------------------------------
dfree | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
treat | 1.548307 .2989212 2.26 0.024 1.060526 2.260439------------------------------------------------------------------------------
. estimates store G
. lrtest A G
Likelihood-ratio test LR chi2(1) = 5.18(Assumption: A nested in G) Prob > chi2 = 0.0229
. logit dfree site
...Logistic regression Number of obs = 575
LR chi2(1) = 1.67Prob > chi2 = 0.1968
Log likelihood = -326.0315 Pseudo R2 = 0.0025------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
site | .2642236 .2034167 1.30 0.194 -.1344658 .662913_cons | -1.15268 .1170732 -9.85 0.000 -1.382139 -.9232202
------------------------------------------------------------------------------
. logit dfree site, or
...Logistic regression Number of obs = 575
LR chi2(1) = 1.67Prob > chi2 = 0.1968
Log likelihood = -326.0315 Pseudo R2 = 0.0025------------------------------------------------------------------------------
dfree | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
site | 1.302419 .2649338 1.30 0.194 .8741828 1.940437------------------------------------------------------------------------------
. estimates store H
. lrtest A H
Likelihood-ratio test LR chi2(1) = 1.67(Assumption: A nested in H) Prob > chi2 = 0.1968
ตาราง การวเคราะห simple logistic regression
0.1971.670.87, 1.941.3020.20340.264site
0.02295.181.06, 2.261.548 0.1931 0.437 treat
0.03154.621.05, 2.391.583 0.2109 0.459 race
0.30, 0.700.460 0.2166 -0.775 ivhx3
0.0013 13.350.37, 1.040.618 0.2657 -0.481 ivhx2
0.000611.840.88, 0.970.9280.0247-0.075ndrgtx
0.42500.640.97, 1.010.9920.0103-0.008beck
0.23711.400.99, 1.051.0180.0153 0.018 age
p valuelikelihood ratio95%CIorseสมประสทธตวแปร
ตวแปร beck ม p value เทากบ 0.426 ดงนนจะตดตวแปร beck
ออกจากการวเคราะห
- พจารณาวาตวแปรใด มมความสาคญ พจารณาจากสถต Wald
- ตวแปรใดทมคา p value > 0.25 จะนาออกจากโมเดล
- อยางไรกตามตวแปรทจากมคา p value > 0.25 แตยงคงไวในโมเดล
เชน พบวาเปนปจจยควบคมทสาคญ หรอมเหตผลอนๆ ทจาเปน
ตองคงตวแปรนนไว
2. Fit a multivariable model containing all covariates
identified for inclusion at step 1 & to assess the importance
of each covariate using the p-value of its Wald statistic.
. use "K:\hosmer_data\logistic\uis.dta", clear
. xi:logit dfree age ndrugtx i.ivhx race treat sitei.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)Iteration 0: log likelihood = -326.86446Iteration 1: log likelihood = -310.17928Iteration 2: log likelihood = -309.62871Iteration 3: log likelihood = -309.62413Iteration 4: log likelihood = -309.62413Logistic regression Number of obs = 575
LR chi2(7) = 34.48Prob > chi2 = 0.0000
Log likelihood = -309.62413 Pseudo R2 = 0.0527------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
age | .0503708 .0173224 2.91 0.004 .0164196 .084322ndrugtx | -.0615121 .0256311 -2.40 0.016 -.1117481 -.0112761
_Iivhx_2 | -.6033296 .2872511 -2.10 0.036 -1.166331 -.0403278_Iivhx_3 | -.732722 .252329 -2.90 0.004 -1.227278 -.2381662
race | .2261295 .2233399 1.01 0.311 -.2116087 .6638677treat | .4425031 .1992909 2.22 0.026 .0519002 .8331061site | .1485845 .2172121 0.68 0.494 -.2771434 .5743125
_cons | -2.405405 .5548058 -4.34 0.000 -3.492805 -1.318006------------------------------------------------------------------------------
13
- พจารณาคา p value จากสถต Ward ของตวแปรทกๆ ตวแปร
- พบวาตวแปร race ม p value เทากบ 0.311
- ตวแปร site ม p value เทากบ 0.494
- เนองจากตวแปร race เปนตวแปรทจากการศกษาพบวาเปน
ปจจยตองควบคมทสาคญ และตวแปร site เปนตวแปรสม
ของพนททศกษา ถงแมวา p value > 0.25 จะยงคงตวแปร
ทงสองไวในโมเดล
* การพจารณาคา p-value ในทนพจารณาจากสถต wald กรณท
ขอมลในแตละกลมตวแปรตามและจานวนตวแปรในโมเดล
ไมเหมาะสม สถตทแนะนาใหใชไดแก likelihood ratio
- การพจารณาตวกวน (confounding) มอทธตอ ตวแปรอน
มากนอยเพยงใด
- พจารณาจาก คาสมประสทธทเปลยนไป เรยกวา
“delta-beta-hat-percent” หรอ
(Hosmer, et al. 2003, 2013)
- a significant change as a delta-beta-hat-percent > 20%.
100ˆ
)ˆˆ(%ˆ x
Full
Fullreduce
%
Step 3: Compare the values of the estimated coefficients in the
smaller model to their respective values from the large
model.
- มการศกษาและแนะนาใหใช คาทเปลยนแปลงไปของคาประมาณ
ของผล (change in effect estimates) เชน odds ratio
- เนองจากคาสมประสทธเปนคาทอยในรปของ log odds ratio
และยงไมมความชดเจนของคาสมประสทธทเปลยนแปลงไป
วามความหมายอยางไร (Kleinbaum & Klein, 2002,p 215)
- คาทเปลยนแปลงไปของ odds ratio แนะนาใชคาท 10%
มแนวโนมวาตวแปรนนมอทธพลกบตวแปร main effect
(Kleinbaum, Kupper, Morgenstern, 1982;
Greenland,1989; Mickey & Greenland, 1989)
* การศกษาของ Bechand & Hosmer (1999) การพจารณาคาทเปลยนแปลงไป
ของสมประสทธ ไมไดบอกเสมอไปวาตวแปรนนๆ เปนตวแปรกวน
- Hosmer et al. (2000, 2013) ใช 100ˆ
)ˆˆ(%ˆ x
full
Fullreduce
. xi:logit dfree age ndrugtx i.ivhx race treati.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)...Logistic regression Number of obs = 575
LR chi2(6) = 34.02Prob > chi2 = 0.0000
Log likelihood = -309.8567 Pseudo R2 = 0.0520------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
age | .0509605 .017309 2.94 0.003 .0170354 .0848856ndrugtx | -.0631998 .0256525 -2.46 0.014 -.1134778 -.0129219
_Iivhx_2 | -.5928725 .2864333 -2.07 0.038 -1.154272 -.0314735_Iivhx_3 | -.7600441 .2489941 -3.05 0.002 -1.248064 -.2720245
race | .2081089 .221453 0.94 0.347 -.2259309 .6421488treat | .438959 .1991429 2.20 0.028 .0486461 .829272_cons | -2.355786 .5501049 -4.28 0.000 -3.433972 -1.2776
------------------------------------------------------------------------------
- ในทนนา site ออกไป (เปนตวอยางการคานวณ เทานนเนองจาก
site เปนตวแปรสาคญ)
-0.800920.438960.44250treat
--0.14859site
-7.969150.208110.22613race
3.72885-0.76004-0.73272_Iivhx_3
-1.73323-0.59287-0.60333_Iivhx_2
2.74369-0.06320-0.06151ndrugtx
1.170720.050960.05037age
Delta beta hat (%)Reduce modelFull modelVariables
100ˆ
ˆˆ)ˆ(%
mod
modmod xhatBetaDeltaelfull
elfullelreduce
- เมอ <20% สามารถ remove ตวแปรนนออกได%
-0.487
-1.10361.053052,
2.278566
1.549016.1969050.4376198-ivhx2
-0.354
-0.8011.049849,
2.291650
1.551092.1991429 .4389590 -site
1.290
2.8971.067954,
2.327755
1.576685.1987689.4553245-race
-1.115
-2.5351.043943,
2.269520
1.539236.1981066 .4312865-ndrugtx
-3.035
-6.9641.025092,
2.222400
1.509359.1974029 .4116853-Age
1.053271,
2.300453
1.556599.1992909.4425031Adjusted all
95% CI
odds ratio
Odds ratiostandard
error
สมประสทธ
ตวแปร treat
Reduce
ตวแปร EE
คาสมประสทธทเปลยนไป /คา effect estimate ทเปลยนไป ของตวแปร main effect
14
. xi:logit dfree age ndrugtx i.ivhx race treat site becki.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)Iteration 0: log likelihood = -326.86446Iteration 1: log likelihood = -310.17972Iteration 2: log likelihood = -309.62533Iteration 3: log likelihood = -309.6238Iteration 4: log likelihood = -309.6238Logistic regression Number of obs = 575
LR chi2(8) = 34.48Prob > chi2 = 0.0000
Log likelihood = -309.6238 Pseudo R2 = 0.0527------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
age | .0504143 .0174058 2.90 0.004 .0162995 .084529ndrugtx | -.0615329 .0256457 -2.40 0.016 -.1117975 -.0112682
_Iivhx_2 | -.6036962 .2875987 -2.10 0.036 -1.167379 -.0400131_Iivhx_3 | -.7336591 .2549904 -2.88 0.004 -1.233431 -.2338871
race | .2260262 .2233692 1.01 0.312 -.2117694 .6638218treat | .4424802 .1992933 2.22 0.026 .0518725 .833088site | .1489209 .2176073 0.68 0.494 -.2775816 .5754234beck | .0002759 .0107983 0.03 0.980 -.0208883 .0214402
_cons | -2.411128 .5983465 -4.03 0.000 -3.583866 -1.238391------------------------------------------------------------------------------
Step 4: Add each variable not selected in Step 1 to the model)Step 5: Examine more closely the variables in the model.
Refer to the model at the end of Step 5 as
the main effects model
ตรวจสอบ Linearity ตวแปร Continuous
วธตรวจสอบ
-Plot Smoothed logit and continuous variable
-Plot Coefficient and continuous variable โดยแบงตวแปร
continuous variable เปน 4 สวนดวย quartile
-Fractional polynomial
. do "G:\hosmer_data\logistic\plot_smooth_logit_age.do"
. lowess dfree age, gen(var3) logit nodraw
. graph twoway line var3 age, sort xlabel(20(10)50 56)
-Plot Smoothed logit and continuous variable-Plot Coefficient and continuous variable โดยแบงตวแปร continuous variable เปน 4 สวนดวย quartile
.xtile age1 = age, nq(4)
.tabstat age, statistics(median ) by(age1) columns(variables)
Summary statistics: p50by categories of: age1 (4 quantiles of age)
age1 | age---------+----------
1 | 252 | 303 | 354 | 40
---------+----------Total | 32
--------------------
. xi: logit dfree i.age1 ndrugtx i.ivhx race treat site
...Logistic regression Number of obs = 575
LR chi2(9) = 34.69Prob > chi2 = 0.0001
Log likelihood = -309.52103 Pseudo R2 = 0.0531
------------------------------------------------------------------------------dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------_Iage1_2 | -.165864 .2909137 -0.57 0.569 -.7360444 .4043163_Iage1_3 | .4693399 .27066 1.73 0.083 -.0611439 .9998237_Iage1_4 | .595771 .3124964 1.91 0.057 -.0167108 1.208253ndrugtx | -.0587551 .0254688 -2.31 0.021 -.108673 -.0088371
_Iivhx_2 | -.5545193 .2853626 -1.94 0.052 -1.11382 .0047811_Iivhx_3 | -.6725536 .2518601 -2.67 0.008 -1.16619 -.1789169
race | .2787172 .2238499 1.25 0.213 -.1600205 .7174549treat | .4430577 .2000427 2.21 0.027 .0509812 .8351343site | .1582001 .2188293 0.72 0.470 -.2706974 .5870976
_cons | -1.054837 .2705875 -3.90 0.000 -1.585179 -.5244956------------------------------------------------------------------------------
.clear
.input age coefage coef
1. 25 02. 30 -.1658643. 35 .46933994. 40 .5957715. end
.graph twoway scatter coef age, connect(l) ylabel(-.25(.25).75) xlabel(20(10)50) yline(0)
15
การวเคราะห fractional polynomial
-การสรางโมเดลโดยวธ Fractional Polynomial เปนการสราง
โมเดลระหวางตวแปรผล (outcome) และตวแปรอสระทม
สเกลการวดแบบตอเนองหรอสเกลแบบจดอนดบ นาเสนอโดย
Royston & Altman (1994)
-เมอตวแปรไม linearity หรอไมมความสมพนธเชงเสน
กลาวอกนยหนงคอโมเดลทมความสมพนธแบบไมใชเสนตรง
(non-linearity) ใหปรบเปลยนตวแปรนนดวยคายกกาลง
(power) ใดๆ
-โดยมชอเรยกเชน การสรางสมการแบบ first-order
fractional polynomial หรอ fp1 etc.
การวเคราะห fractional polynomial
-การแปลงคาของ x ใดๆ เปนคา xp ตามทเหมาะสมจาก
คายกกาลง p ดงตอไปน -2, -1, -0.5, 0, 0.5, 1, 2, 3
-เมอ p=0 คา xp คอคาของ log x ดงนนการปรบเปลยนใน
กลมนมไดทงหมด 8 รปแบบ
-การสรางสมการแบบ second-order fractional polynomial
หรอ fp2 เปนการแปลงคาของ x ใด เปนคา xp ตามทเหมาะสม
จากคายกกาลง p เปนคๆ การปรบเปลยนในกลมนมได
ทงหมด 72 รปแบบ
33100.5-1
320.500-1
2200-0.5-1
313-0.5-1-1
212-0.53-23
111-0.52-22
30.50.5-0.51-21
20.50-0.50.5-20.5
10.5-0.5-0.50-20
0.50.53-1-0.5-2-0.5
302-1-1-2-1
201-1-2-2-2
p2P1P2p1p2p1p
powerpowerpowerPower
FP2FP1
Power of First &
second-order
fractional polynomial
First order (FP1) p=8
Second order (FP2) p=72
วธการเปลยนรปตวแปรตอเนองโดยวธ Fractional Polynomial
โดยการสรางสเกล (Scaling) และหรอ การเปลยนรป
โดยการปรบจากคากลาง (center)
-การปรบเปลยนตวแปรตอเนองโดยวธ Fractional Polynomial
กรณทตวแปรตอเนองไมมลกษณะเชงเสน สามารถกาหนดไดหลายวธ
เชน
การเปลยนรป (transform) โดยการสรางสเกล (Scaling) และหรอ
การเปลยนรปโดยการปรบจากคากลาง (center)
วธ Fractional Polynomial โดยการสรางสเกล (Scaling)
-สามารถทาไดหลายวธ เชนการสรางสเกลโดยใชโปรแกรม STATA
มการกาหนดดงน
lrange = log10[max(x) - min(x)]
scale = 10sign(lrange)int(|lrange|)
x∗ = x/scale
วธ Fractional Polynomial โดยการปรบจากคากลาง (center)
-เชนการเปลยนรปตวแปร ใชสญลกษณ
-กรณเปลยนรปแบบ FP1 ดงนนจากโมเดล
เปลยนรปเปน เมอ
*1x
*1x1x
11ˆ xy oi
)*)(ˆ 1*1
*1
*0
ppi xxy
n
ix
nx
1 1
1*
การวเคราะห fractional polynomial
-การเลอกโมเดลใดๆ พจารณาจาก คาความแตกตางของ
Deviance ระหวางโมเดลทใชในการวเคราะห ดงน
-คาความแตกตางของ Deviance ประมาณไดกบการแจกแจงแบบ
Chi-Square ท df= df(model2)-df(model1)
)},()({2),(,( 211211 ppLpLpppG
)}()1({2),1( 11 pLLpG (df=1)
(df=2)
16
การวเคราะห fractional polynomial
-การเลอกโมเดลใดๆ พจารณาจาก คาความแตกตางของ
Deviance ระหวางโมเดลทใชในการวเคราะห ดงน
-คาความแตกตางของ Deviance ประมาณไดกบการแจกแจงแบบ
Chi-Square ท df = df(model 2)-df(model 1)
-อยางไรกตามการวเคราะหโดยใช fractional polynomial ทาให
การแปลผลยงยาก วธแกไขโดยการจดกลมตวแปร ตอเนองอยาง
เหมาะสม โดยศกษาจากทฤษฎ การศกษาวจย การใช cut point ดวย
Median, Quartile ตองพงระมดระวงสาหรบ การจดกลมกบตวแปร
ตอเนอง อาจใหเกดขอสรปทคาดเคลอนได
Heinzl H.,2000; Royston P., Altman D.G., Sauerbrei W., 2006)
Fractional polynomial. use "H:\hosmer_data\logistic\uis.dta", clear. xi:fracpoly logit dfree age ndrugtx i.ivhx race treat site,degree(2)comparei.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)-> gen double Indru__1 = ndrugtx-4.542608696 if e(sample)........-> gen double Iage__1 = X^-2-.0953622163 if e(sample)-> gen double Iage__2 = X^3-33.95748331 if e(sample)
(where: X = age/10)Iteration 0: log likelihood = -326.86446…Iteration 4: log likelihood = -309.38436Logistic regression Number of obs = 575
LR chi2(8) = 34.96Prob > chi2 = 0.0000
Log likelihood = -309.38436 Pseudo R2 = 0.0535------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
Iage__1 | -1.538626 4.575934 -0.34 0.737 -10.50729 7.43004Iage__2 | .0116581 .0080977 1.44 0.150 -.0042132 .0275293
Indru__1 | -.0620596 .0257223 -2.41 0.016 -.1124744 -.0116447_Iivhx_2 | -.6057376 .2881578 -2.10 0.036 -1.170517 -.0409587_Iivhx_3 | -.7263554 .2525832 -2.88 0.004 -1.221409 -.2313014
race | .2282107 .224089 1.02 0.308 -.2109957 .6674171treat | .4392589 .1996983 2.20 0.028 .0478573 .8306604site | .1459101 .217491 0.67 0.502 -.2803644 .5721846
_cons | -1.082342 .2416317 -4.48 0.000 -1.555931 -.6087524------------------------------------------------------------------------------Deviance: 618.77. Best powers of age among 44 models fit: -2 3.
1. เมอ power=1 หรอ age เปน linear เมอเปรยบเทยบ age อยในโมเดลกบ
ไมม age ในโมเดล (p-value=.003;df=1-0)
2. เมอเปรยบเทยบ age กบ (age-2 และ age3) พบวาไม significant
(Dev. dif.=619.248-618.769= 0.480; p-value=0.923, df=4-1)
3. เปรยบเทยบ age3 กบ (age
-2 และ age
3) พบวาไม significant
(Dev. dif.=618.882-618.769=0.133,p-value=0.945;df=4-2)
First order (FP1) Second order (FP2)
Fractional polynomial model comparisons:---------------------------------------------------------------age df Deviance Dev. dif. P (*) Powers---------------------------------------------------------------Not in model 0 627.801 9.032 0.060Linear 1 619.248 0.480 0.923 1m = 1 2 618.882 0.114 0.945 3m = 2 4 618.769 -- -- -2 3---------------------------------------------------------------(*) P-value from deviance difference comparing reported model with m = 2 model
. di chiprob(4-1,619.248-618.769)
.9234802
. di chiprob(4-2,618.882-618.769)
.94506648
การพจารณาวาโมเดลใดๆ ดกวา linear model
ใน Fractional polynomial
G(1,(p1, p
2) = -2{L(1) - L(p
1, p
2)}
= 619.248 - 618.769 = 0.480; p-value = 0.923
เลอก linear model
Fractional polynomial model comparisons:---------------------------------------------------------------age df Deviance Dev. dif. P (*) Powers---------------------------------------------------------------Not in model 0 627.801 9.032 0.060Linear 1 619.248 0.480 0.923 1m = 1 2 618.882 0.114 0.945 3m = 2 4 618.769 -- -- -2 3---------------------------------------------------------------(*) P-value from deviance difference comparing reported model with m = 2 model
G(1,p1) = -2{L(1) - L(p1)}=619.248-618.882=.366;p-value=.545
STATA10
First order m=1 (FP1) Second order m=2 (FP2)
. use "I:\hosmer_data\logistic\uis.dta", clear
. xi:fracpoly logit dfree age ndrugtx i.ivhx race treat site,degree(1)comparei.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)-> gen double Indru__1 = ndrugtx-4.542608696 if e(sample)-> gen double Iage__1 = X^3-33.95748331 if e(sample)
(where: X = age/10)Iteration 0: log likelihood = -326.86446...------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
Iage__1 | .0138939 .0046486 2.99 0.003 .0047829 .023005Indru__1 | -.0620649 .0257325 -2.41 0.016 -.1124997 -.0116301_Iivhx_2 | -.5960999 .2868616 -2.08 0.038 -1.158338 -.0338615_Iivhx_3 | -.714141 .2499592 -2.86 0.004 -1.204052 -.22423
race | .2355037 .2230028 1.06 0.291 -.2015736 .6725811treat | .4348659 .1992503 2.18 0.029 .0443425 .8253893site | .1436801 .2173756 0.66 0.509 -.2823683 .5697285
_cons | -1.113293 .2236989 -4.98 0.000 -1.551734 -.6748509------------------------------------------------------------------------------Deviance: 618.88. Best powers of age among 8 models fit: 3.Fractional polynomial model comparisons:---------------------------------------------------------------age df Deviance Dev. dif. P (*) Powers---------------------------------------------------------------Not in model 0 627.801 8.918 0.012Linear 1 619.248 0.366 0.545 1m = 1 2 618.882 -- -- 3---------------------------------------------------------------(*) P-value from deviance difference comparing reported model with m = 1 model
. di chiprob(2-1,619.248-618.882)
.54519273
. xi:fracpoly logit dfree age ndrugtx i.ivhx race treat site, degree(2) comparei.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)-> gen double Indru__1 = ndrugtx-4.542608696 if e(sample)........-> gen double Iage__1 = X^-2-.0953622163 if e(sample)-> gen double Iage__2 = X^3-33.95748331 if e(sample)
(where: X = age/10)Iteration 0: log likelihood = -326.86446Iteration 1: log likelihood = -309.95259Iteration 2: log likelihood = -309.38924Iteration 3: log likelihood = -309.38436Iteration 4: log likelihood = -309.38436Logistic regression Number of obs = 575
LR chi2(8) = 34.96Prob > chi2 = 0.0000
Log likelihood = -309.38436 Pseudo R2 = 0.0535------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
Iage__1 | -1.538626 4.575934 -0.34 0.737 -10.50729 7.43004Iage__2 | .0116581 .0080977 1.44 0.150 -.0042132 .0275293
Indru__1 | -.0620596 .0257223 -2.41 0.016 -.1124744 -.0116447_Iivhx_2 | -.6057376 .2881578 -2.10 0.036 -1.170517 -.0409587_Iivhx_3 | -.7263554 .2525832 -2.88 0.004 -1.221409 -.2313014
race | .2282107 .224089 1.02 0.308 -.2109957 .6674171treat | .4392589 .1996983 2.20 0.028 .0478573 .8306604site | .1459101 .217491 0.67 0.502 -.2803644 .5721846
_cons | -1.082342 .2416317 -4.48 0.000 -1.555931 -.6087524------------------------------------------------------------------------------Deviance: 618.77. Best powers of age among 44 models fit: -2 3.
STATA10
17
Fractional polynomial model comparisons:---------------------------------------------------------------age df Deviance Dev. dif. P (*) Powers---------------------------------------------------------------Not in model 0 627.801 9.032 0.060Linear 1 619.248 0.480 0.923 1m = 1 2 618.882 0.114 0.945 3m = 2 4 618.769 -- -- -2 3---------------------------------------------------------------(*) P-value from deviance difference comparing reported model with m = 2 model
ตวแปร ndrugtx. lowess dfree ndrugtx , gen(var2) logit nodraw. graph twoway line var2 ndrugtx, sort xlabel(20(10)50 56)
. xi:fracpoly logit dfree ndrugtx age i.ivhx race treat site, degree(2) comparei.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)-> gen double Iage__1 = age-32.3826087 if e(sample)........-> gen double Indru__1 = X^-1-1.804204581 if e(sample)-> gen double Indru__2 = X^-1*ln(X)+1.064696882 if e(sample)
(where: X = (ndrugtx+1)/10)…------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
Indru__1 | .981453 .2888487 3.40 0.001 .4153199 1.547586Indru__2 | .3611251 .1098594 3.29 0.001 .1458047 .5764455Iage__1 | .0544455 .0174877 3.11 0.002 .0201702 .0887208
_Iivhx_2 | -.6088269 .2911069 -2.09 0.036 -1.179386 -.0382679_Iivhx_3 | -.7238122 .2555649 -2.83 0.005 -1.22471 -.2229142
race | .2477026 .2242156 1.10 0.269 -.1917519 .6871571treat | .4223666 .2003655 2.11 0.035 .0296574 .8150759site | .1732142 .2209763 0.78 0.433 -.2598915 .6063198
_cons | -1.164471 .2454825 -4.74 0.000 -1.645608 -.6833343------------------------------------------------------------------------------Deviance: 613.45. Best powers of ndrugtx among 44 models fit: -1 -1.
Fractional polynomial model comparisons:---------------------------------------------------------------ndrugtx df Deviance Dev. dif. P (*) Powers---------------------------------------------------------------Not in model 0 626.176 12.725 0.013Linear 1 619.248 5.797 0.122 1m = 1 2 618.818 5.367 0.068 .5m = 2 4 613.451 -- -- -1 -1---------------------------------------------------------------(*) P-value from deviance difference comparing reported model with m = 2 model
ตวแปร ndrugtx. xi:mfp logit dfree ndrugtx age i.ivhx race treat site i.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)
Deviance for model with all terms untransformed = 619.248, 575 observations
Variable Model (vs.) Deviance Dev diff. P Powers (vs.)----------------------------------------------------------------------age lin. FP2 619.248 0.480 0.923 1 -2 3
Final 619.248 1
[_Iivhx_3 included with 1 df in model]
ndrugtx lin. FP2 619.248 5.797 0.122 1 -1 -1Final 619.248 1
[treat included with 1 df in model]
[_Iivhx_2 included with 1 df in model]
[race included with 1 df in model]
[site included with 1 df in model]
ใชคาสง Multivariate Fractional Multinomial (mfp)
Fractional polynomial fitting algorithm converged after 1 cycle.
Transformations of covariates:
-> gen double Indru__1 = ndrugtx-4.542608696 if e(sample) -> gen double Iage__1 = age-32.3826087 if e(sample)
Final multivariable fractional polynomial model for dfree--------------------------------------------------------------------
Variable | -----Initial----- -----Final-----| df Select Alpha Status df Powers
-------------+------------------------------------------------------ndrugtx | 4 1.0000 0.0500 in 1 1
age | 4 1.0000 0.0500 in 1 1_Iivhx_2 | 1 1.0000 0.0500 in 1 1_Iivhx_3 | 1 1.0000 0.0500 in 1 1
race | 1 1.0000 0.0500 in 1 1treat | 1 1.0000 0.0500 in 1 1site | 1 1.0000 0.0500 in 1 1
--------------------------------------------------------------------
Logistic regression Number of obs = 575LR chi2(7) = 34.48Prob > chi2 = 0.0000
Log likelihood = -309.62413 Pseudo R2 = 0.0527
------------------------------------------------------------------------------dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------Indru__1 | -.0615121 .0256311 -2.40 0.016 -.1117481 -.0112761Iage__1 | .0503708 .0173224 2.91 0.004 .0164196 .084322
_Iivhx_2 | -.6033296 .2872511 -2.10 0.036 -1.166331 -.0403278_Iivhx_3 | -.732722 .252329 -2.90 0.004 -1.227278 -.2381662
race | .2261295 .2233399 1.01 0.311 -.2116087 .6638677treat | .4425031 .1992909 2.22 0.026 .0519002 .8331061site | .1485845 .2172121 0.68 0.494 -.2771434 .5743125
_cons | -1.053693 .2264488 -4.65 0.000 -1.497524 -.6098613------------------------------------------------------------------------------Deviance: 619.248.
18
Step 6: Check for interactions among the variables in the
model. Only consider the statistical significance of
interactions and as such, they must contribute to
the model at traditional levels, such as 5% or even 1%.
การสรางตวแปรอตรกรยา (interaction)
-สรางโมเดลทประกอบดวย Main Effect (จาก step 5)
-สรางตวแปรประกอบเปน interaction order from ทสงกวา
ตองมตวแปร ใน order ทตากวา เรยกวา
“Hierarchically Well-formated Model (HWL)”
-เชนเมอม third order term
logit P(X) = x1
+ x2
+ x3
+ x1*x
2+x
1*x
3+ x
2*x
3+ x
1*x
2*x
3
logit P(X) = x1
+ x2
+ x3
+ x2*x
3+ x
1*x
2*x
3(ไมถกตอง)
Interaction assessment
-วเคราะหโดยใช Z-test (Wald)
-วเคราะหโดยใช Likelihood ratio test
)ln(ln2:
)ln2()ln2(:
)ˆ(
ˆ:
fullreduced
fullreduced
i
ij
RLR
LRLRtestLR
seZtestWald
0:;0:1
3030
213220
HH
xxxxy
. gen tage= treat* age
. logit dfree treat age tage
Iteration 0: log likelihood = -326.86446Iteration 1: log likelihood = -322.31165Iteration 2: log likelihood = -322.26464Iteration 3: log likelihood = -322.26464
Logistic regression Number of obs = 575LR chi2(3) = 9.20Prob > chi2 = 0.0268
Log likelihood = -322.26464 Pseudo R2 = 0.0141
------------------------------------------------------------------------------dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------treat | -1.123388 1.042136 -1.08 0.281 -3.165936 .9191606
age | -.0077915 .0238604 -0.33 0.744 -.0545571 .0389741tage | .0480969 .0314183 1.53 0.126 -.0134819 .1096756
_cons | -1.043996 .7884888 -1.32 0.185 -2.589406 .5014138------------------------------------------------------------------------------
. qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3
. di "Log likelihood = " e(ll)Log likelihood = -306.72558. estimates store A. qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 age_ndrugfp1
. estimates store B1
. di "Log likelihood = " e(ll)Log likelihood = -302.87416. lrtest A B1 Likelihood-ratio test LR chi2(1) = 7.70(Assumption: A nested in B1) Prob > chi2 = 0.0055. qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 age_ndrugfp2
. estimates store B2
. di "Log likelihood = " e(ll)Log likelihood = -303.03684. lrtest A B2 Likelihood-ratio test LR chi2(1) = 7.38(Assumption: A nested in B2) Prob > chi2 = 0.0066. qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 ageivhx2
. estimates store C
. di "Log likelihood = " e(ll)Log likelihood = -306.36027. lrtest A CLikelihood-ratio test LR chi2(1) = 0.73(Assumption: A nested in C) Prob > chi2 = 0.3927
. qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 ageivhx3
. estimates store D
. di "Log likelihood = " e(ll)Log likelihood = -306.68672. lrtest A DLikelihood-ratio test LR chi2(1) = 0.08(Assumption: A nested in D) Prob > chi2 = 0.7804
. qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 agerace
. estimates store E
. di "Log likelihood = " e(ll)Log likelihood = -306.6269. lrtest A ELikelihood-ratio test LR chi2(1) = 0.20(Assumption: A nested in E) Prob > chi2 = 0.6569. qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 agetreat
. estimates store F
. di "Log likelihood = " e(ll)Log likelihood = -305.34312. lrtest A FLikelihood-ratio test LR chi2(1) = 2.76(Assumption: A nested in F) Prob > chi2 = 0.0964
. qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 agesite
. estimates store G
. di "Log likelihood = " e(ll)Log likelihood = -305.92657. lrtest A GLikelihood-ratio test LR chi2(1) = 1.60(Assumption: A nested in G) Prob > chi2 = 0.2062. qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 ndrugfp1_ivhx2
. estimates store F11
. di "Log likelihood = " e(ll)Log likelihood = -305.61857. lrtest A F11Likelihood-ratio test LR chi2(1) = 2.21(Assumption: A nested in F11) Prob > chi2 = 0.1368. qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 ndrugfp1_ivhx3
. estimates store F12
. di "Log likelihood = " e(ll)Log likelihood = -306.66329. lrtest A F12Likelihood-ratio test LR chi2(1) = 0.12(Assumption: A nested in F12) Prob > chi2 = 0.7241
19
. qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 ndrugfp1_race
. estimates store F13
. di "Log likelihood = " e(ll)Log likelihood = -306.32029. lrtest A F13Likelihood-ratio test LR chi2(1) = 0.81(Assumption: A nested in F13) Prob > chi2 = 0.3679. qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 ndrugfp1_treat
. estimates store F14
. di "Log likelihood = " e(ll)Log likelihood = -305.28879. lrtest A F14Likelihood-ratio test LR chi2(1) = 2.87(Assumption: A nested in F14) Prob > chi2 = 0.0900. qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 ndrugfp2_ivhx2
. estimates store F21
. di "Log likelihood = " e(ll)Log likelihood = -305.43893. lrtest A F21Likelihood-ratio test LR chi2(1) = 2.57(Assumption: A nested in F21) Prob > chi2 = 0.1087
. qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 ndrugfp2_ivhx3
. estimates store F22
. di "Log likelihood = " e(ll)Log likelihood = -306.71318. lrtest A F22Likelihood-ratio test LR chi2(1) = 0.02(Assumption: A nested in F22) Prob > chi2 = 0.8749. qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 ndrugfp2_race
. estimates store F23
. di "Log likelihood = " e(ll)Log likelihood = -306.49124. lrtest A F23Likelihood-ratio test LR chi2(1) = 0.47(Assumption: A nested in F23) Prob > chi2 = 0.4936. qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 ndrugfp2_treat
. estimates store F24
. di "Log likelihood = " e(ll)Log likelihood = -305.25896. lrtest A F24Likelihood-ratio test LR chi2(1) = 2.93(Assumption: A nested in F24) Prob > chi2 = 0.0868. qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 raceivhx2
. estimates store M
. di "Log likelihood = " e(ll)Log likelihood = -306.27003. lrtest A MLikelihood-ratio test LR chi2(1) = 0.91(Assumption: A nested in M) Prob > chi2 = 0.3398
. qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 raceivhx3
. estimates store N
. di "Log likelihood = " e(ll)Log likelihood = -306.04202. lrtest A NLikelihood-ratio test LR chi2(1) = 1.37(Assumption: A nested in N) Prob > chi2 = 0.2423. qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 racetreat
. estimates store O
. di "Log likelihood = " e(ll)Log likelihood = -306.25412. lrtest A OLikelihood-ratio test LR chi2(1) = 0.94(Assumption: A nested in O) Prob > chi2 = 0.3315. qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 racesite
. estimates store P
. di "Log likelihood = " e(ll)Log likelihood = -302.45334. lrtest A PLikelihood-ratio test LR chi2(1) = 8.54(Assumption: A nested in P) Prob > chi2 = 0.0035
. qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 treativhx2
. estimates store Q
. di "Log likelihood = " e(ll)Log likelihood = -306.70711. lrtest A QLikelihood-ratio test LR chi2(1) = 0.04(Assumption: A nested in Q) Prob > chi2 = 0.8476. qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 treativhx3
. estimates store R
. di "Log likelihood = " e(ll)Log likelihood = -306.72555. lrtest A RLikelihood-ratio test LR chi2(1) = 0.00(Assumption: A nested in R) Prob > chi2 = 0.9941. qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 treatsite
. estimates store S
. di "Log likelihood = " e(ll)Log likelihood = -306.70871. lrtest A SLikelihood-ratio test LR chi2(1) = 0.03(Assumption: A nested in S) Prob > chi2 = 0.8543
. qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 siteivhx2
. estimates store T
. di "Log likelihood = " e(ll)Log likelihood = -306.63454. lrtest A TLikelihood-ratio test LR chi2(1) = 0.18(Assumption: A nested in T) Prob > chi2 = 0.6696. qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 siteivhx3
. estimates store U
. di "Log likelihood = " e(ll)Log likelihood = -306.30032. lrtest A ULikelihood-ratio test LR chi2(1) = 0.85(Assumption: A nested in U) Prob > chi2 = 0.3564
0.136812.21-305.61857ndrugfp1 x_Iivhx_2
0.206211.60-305.92657age x site
0.096412.76-305.34312age x treat
0.656910.20-306.6269age x race
0.780410.73-306.36027age x _Iivhx_3
0.392710.73-306.36027age x _Iivhx_2
0.006617.38-303.03684age x ndrugfp2
0.032114.59-307.32665 age x ndrugfp1
-309.62413 โมเดล main effect
P valuedfGLog likelihoodinteraction
20
0.242311.37-306.04202race x _Iivhx_3
0.339810.91-306.27003race x _Iivhx_2
0.956310.003-306.72408ndrugfp2 x site
0.086812.93-305.25896ndrugfp2 x treat
0.493610.47-306.49124ndrugfp2 x race
0.136812.21-306.71318ndrugfp2 x _Iivhx_3
0.108712.57-305.43893ndrugfp2 x _Iivhx_2
0.958610.002-306.72423ndrugfp1 x site
0.090012.87-305.28879ndrugfp1 x treat
0.367910.81-306.32029ndrugfp1 x race
0.724110.12-306.66329ndrugfp1 x _Iivhx_3
P valuedfGLog likelihoodinteraction
0.356410.85-306.30032site x _Iivhx_3
0.669610.18-306.63454site x _Iivhx_2
0.854310.03-306.70871treat x site
0.99411.00005-306.72555treat x _Iivhx_3
0.847610.04-306.70711treat x _Iivhx_2
0.003518.54-302.45334race x site
0.331510.94-306.70871race x treat
P valuedfGLog likelihoodinteraction
การพจารณาตวแปร interaction เขาในโมเดล พจารณา p-value
ทระดบนยสาคญท 0.10 ประกอบดวยตวแปร age*ndrugfp1,
Age*ndrugfp2, age*treat, ndrugfp1*treat, race*site
. xi:logit dfree age ndrugfp1 ndrugfp2 race treat site i.ivhx age_ndrugfp1 age_ndrugfp2 agetreat ndrugfp1_treat racesite
i.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)…Logistic regression Number of obs = 575
LR chi2(13) = 59.70Prob > chi2 = 0.0000
Log likelihood = -297.01266 Pseudo R2 = 0.0913------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
age | .1265633 .0749318 1.69 0.091 -.0203003 .2734269ndrugfp1 | 2.526072 1.596445 1.58 0.114 -.6029022 5.655047ndrugfp2 | .744546 .6043186 1.23 0.218 -.4398966 1.928989
race | .6905325 .2667743 2.59 0.010 .1676645 1.213401treat | -.6315667 1.236548 -0.51 0.610 -3.055156 1.792022site | .4927464 .2565283 1.92 0.055 -.0100398 .9955326
_Iivhx_2 | -.6062768 .3000219 -2.02 0.043 -1.194309 -.0182447_Iivhx_3 | -.6767542 .2629918 -2.57 0.010 -1.192209 -.1612997
age_ndrugfp1 | -.0382959 .046105 -0.83 0.406 -.12866 .0520682age_ndrugfp2 | -.0088257 .0177002 -0.50 0.618 -.0435174 .025866
agetreat | .04181 .0338704 1.23 0.217 -.0245747 .1081947ndrugfp1_t~t | -.077965 .0731851 -1.07 0.287 -.2214052 .0654751
racesite | -1.34883 .5353968 -2.52 0.012 -2.398188 -.2994715_cons | -7.458669 2.69444 -2.77 0.006 -12.73968 -2.177663
------------------------------------------------------------------------------
ตวแปร age x ndrugfp2 มคาสถต ward เทากบ -0.50 และ p value = 0.618
มากทสดใหนาตวแปรนออกจากโมเดล วเคราะหโมเดลใหม
. xi:logit dfree age ndrugfp1 ndrugfp2 race treat site i.ivhx age_ndrugfp1 agetreat ndrugfp1_treat racesite
i.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)…Logistic regression Number of obs = 575
LR chi2(12) = 59.46Prob > chi2 = 0.0000
Log likelihood = -297.1368 Pseudo R2 = 0.0909------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
age | .0935724 .0349887 2.67 0.007 .0249957 .162149ndrugfp1 | 1.76005 .414738 4.24 0.000 .9471787 2.572922ndrugfp2 | .4497105 .1176999 3.82 0.000 .2190229 .680398
race | .6782462 .2654307 2.56 0.011 .1580116 1.198481treat | -.6081425 1.238307 -0.49 0.623 -3.03518 1.818895site | .4852708 .2561725 1.89 0.058 -.0168181 .9873598
_Iivhx_2 | -.6123348 .2997954 -2.04 0.041 -1.199923 -.0247467_Iivhx_3 | -.6811741 .2627758 -2.59 0.010 -1.196205 -.166143
age_ndrugfp1 | -.0155282 .0061056 -2.54 0.011 -.0274949 -.0035615agetreat | .0412879 .0339743 1.22 0.224 -.0253006 .1078764
ndrugfp1_t~t | -.0783593 .0732363 -1.07 0.285 -.2218998 .0651812racesite | -1.333799 .53443 -2.50 0.013 -2.381263 -.2863356
_cons | -6.319684 1.403203 -4.50 0.000 -9.069911 -3.569457------------------------------------------------------------------------------
ตวแปร ndrugfp1 x treat มคาสถต ward เทากบ -1.07 และ p value = 0.285 มากทสด
ในโมเดลน ใหนาตวแปรนออกจากโมเดล วเคราะหโมเดลใหม
. xi:logit dfree age ndrugfp1 ndrugfp2 race treat site i.ivhx age_ndrugfp1 agetreat racesite
i.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)…Logistic regression Number of obs = 575
LR chi2(11) = 58.31Prob > chi2 = 0.0000
Log likelihood = -297.71139 Pseudo R2 = 0.0892------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
age | .0889238 .0339956 2.62 0.009 .0222937 .1555539ndrugfp1 | 1.705601 .4106322 4.15 0.000 .9007769 2.510426ndrugfp2 | .4440587 .1175928 3.78 0.000 .213581 .6745364
race | .6869266 .265402 2.59 0.010 .1667483 1.207105treat | -1.252787 1.080874 -1.16 0.246 -3.371262 .8656875site | .4903829 .2560081 1.92 0.055 -.0113838 .9921497
_Iivhx_2 | -.6299072 .2994363 -2.10 0.035 -1.216792 -.0430227_Iivhx_3 | -.694879 .262544 -2.65 0.008 -1.209456 -.1803021
age_ndrugfp1 | -.0155328 .0060924 -2.55 0.011 -.0274737 -.0035918agetreat | .0515973 .0325362 1.59 0.113 -.0121726 .1153672racesite | -1.401606 .5309161 -2.64 0.008 -2.442183 -.3610301
_cons | -5.976921 1.338859 -4.46 0.000 -8.601036 -3.352807------------------------------------------------------------------------------
ตวแปร age x treat มคาสถต wald เทากบ 1.59 และ p value = 0.113 มากทสด
ในโมเดลน ใหนาตวแปรนออกจากโมเดล วเคราะหโมเดลใหม ** กรณใช p-value 0.25
คงไวในโมเดล
. xi:logit dfree age ndrugfp1 ndrugfp2 race treat site i.ivhx age_ndrugfp1 racesitei.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)…Logistic regression Number of obs = 575
LR chi2(10) = 55.77Prob > chi2 = 0.0000
Log likelihood = -298.98146 Pseudo R2 = 0.0853------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
age | .1166385 .0288749 4.04 0.000 .0600446 .1732323ndrugfp1 | 1.669035 .407152 4.10 0.000 .871032 2.467038ndrugfp2 | .4336886 .1169052 3.71 0.000 .2045586 .6628185
race | .6841068 .2641355 2.59 0.010 .1664107 1.201803treat | .4349255 .2037596 2.13 0.033 .035564 .834287site | .516201 .2548881 2.03 0.043 .0166295 1.015773
_Iivhx_2 | -.6346307 .2987192 -2.12 0.034 -1.220109 -.0491518_Iivhx_3 | -.7049475 .2615805 -2.69 0.007 -1.217636 -.1922591
age_ndrugfp1 | -.0152697 .0060268 -2.53 0.011 -.0270819 -.0034575racesite | -1.429457 .5297806 -2.70 0.007 -2.467808 -.3911062
_cons | -6.843864 1.219316 -5.61 0.000 -9.23368 -4.454048------------------------------------------------------------------------------
- พบวาทกตวแปรม p value < 0.05 ทกตวแปร ใหนาโมเดลใหมน
ไปทดสอบความเหมาะสมของสมการตอไป
21
Step 7: Before any model becomes the final model we must
assess its adequacy and check its fit.
Computation and evaluation of overall measures of fit
- Pearson Chi-Square
- Hosmer-Lameshow Test
- Classification Table
- Area Under the Receiver Operating Characteristic Curve
(ROC)
- Examination of others measure (R2)
Logistic Regression Diagnostics
Assessment of fit via External validation
Next---> for Detailed
ขอพงระวงในการวเคราะห logistic regression
-ภาวะรวมเสนตรงหรอภาวะรวมเสนตรงพห: ความสมพนธ
ระหวางตวแปรอสระสง (collinearity or multicollinearity)
ทาให coefficient เปลยนแปลง
การแกปญหา Ridge logistic regression, พจารณาตดตวแปร,
สรางตวแปรใหม
- influential observation (outliers)
- Zero cell or Sparse data
- Problem of perfect or complete separation
- Overdispersion
ภาวะรวมเสนตรง* (Collinearity)
ความสมพนธระหวางตวแปรอสระดวยกน มคาสง
(r2 > 0.90; r > 0.95 Kleinbaum, Muller, Nizam; 1998, 241)
การลดหรอเพมตวแปรในโมเดล ทาใหเปลยนแปลงคาสมประสทธ
ทงขนาดและ/หรอเครองหมาย
คา R2 มคาสงแตการทดสอบทางสถตกบสมประสทธ พบวา
ไมมนยสาคญ
ทาใหคา Standard error สง ซงสงผลใหคาสถตมคาตาเชน t, z
และทาใหคาชวงเชอมนของสมประสทธมคากวาง
*พจนานกรมศพทคณตศาสตร ฉบบราชบณฑตยสถาน, 2552
. twocat 98 1 1 98
. logit y x1 x2 x3------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
x1 | .2995896 1.429618 0.21 0.834 -2.502411 3.10159x2 | -.0143819 1.429593 -0.01 0.992 -2.816334 2.78757x3 | .3139715 .2886275 1.09 0.277 -.2517281 .8796711
_cons | -.3670144 .2425088 -1.51 0.130 -.8423228 .1082941------------------------------------------------------------------------------
. logit y x1 x3------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
x1 | .2854983 .2861025 1.00 0.318 -.2752523 .8462489x3 | .3136786 .2871556 1.09 0.275 -.249136 .8764931
_cons | -.3670266 .2425058 -1.51 0.130 -.8423293 .1082761------------------------------------------------------------------------------
. logit y x2 x3------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
x2 | .279187 .2860666 0.98 0.329 -.2814933 .8398672x3 | .3079032 .2871195 1.07 0.284 -.2548407 .8706471
_cons | -.3612278 .2408548 -1.50 0.134 -.8332944 .1108389------------------------------------------------------------------------------
. corr x1 x2 x3(obs=198)
| x1 x2 x3-------------+---------------------------
x1 | 1.0000x2 | 0.9798 1.0000x3 | 0.0000 0.0203 1.0000
. collin x1 x2 x3Collinearity Diagnostics
SQRT R-Variable VIF VIF Tolerance Squared
----------------------------------------------------x1 25.25 5.03 0.0396 0.9604x2 25.26 5.03 0.0396 0.9604x3 1.01 1.01 0.9897 0.0103
----------------------------------------------------Mean VIF 17.17
CondEigenval Index
---------------------------------1 3.0422 1.00002 0.6908 2.09853 0.2570 3.44084 0.0100 17.4460
---------------------------------Condition Number 17.4460Eigenvalues & Cond Index computed from scaled raw sscp (w/ intercept)Det(correlation matrix) 0.0396
การตรวจสอบ collinearity หรอ multicollinearity
Pearson Correlation (informal method)
-ตรวจสอบความสมพนธทกตวแปร โดยใชสถต Pearson correlation
พจารณาตวแปรทมความสมพนธกบตวแปรอนๆ สง. corr age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 age_ndrugfp1 racesite
(obs=575)| age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 age_nd~1 racesite
-------------+------------------------------------------------------------------------------------------age | 1.0000
ndrugfp1 | -0.1836 1.0000ndrugfp2 | 0.1601 -0.9916 1.0000
race | 0.0139 0.0874 -0.0821 1.0000treat | -0.0446 0.0251 -0.0204 0.0791 1.0000site | -0.0287 0.1923 -0.1926 -0.0795 -0.0230 1.0000
_Iivhx_2 | 0.1063 -0.0551 0.0567 -0.0152 0.0513 0.1623 1.0000_Iivhx_3 | 0.2674 -0.3045 0.2843 -0.1806 -0.0695 -0.2292 -0.4138 1.0000
age_ndrugfp1 | 0.0462 0.9546 -0.9475 0.1080 0.0108 0.1833 -0.0134 -0.2506 1.0000racesite | 0.0430 0.1831 -0.1834 0.4384 0.0522 0.3849 -0.0303 -0.1295 0.2055 1.0000
22
Variance Inflation Factors (VIF: formal method)
พจารณาคา VIF > 10 และ
คาเฉลยของ VIF มากกวา 1 มปญหาการเกด multicolinearity. collin age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 age_ndrugfp1 racesiteCollinearity Diagnostics
SQRT R-Variable VIF VIF Tolerance Squared
----------------------------------------------------age 2.64 1.63 0.3782 0.6218
ndrugfp1 105.68 10.28 0.0095 0.9905ndrugfp2 63.77 7.99 0.0157 0.9843
race 1.43 1.20 0.6969 0.3031treat 1.02 1.01 0.9831 0.0169site 1.41 1.19 0.7090 0.2910
_Iivhx_2 1.39 1.18 0.7201 0.2799_Iivhx_3 1.65 1.28 0.6061 0.3939
age_ndrugfp1 27.55 5.25 0.0363 0.9637racesite 1.64 1.28 0.6109 0.3891
----------------------------------------------------Mean VIF 20.82
- Generalized Variance inflaction factor (GVIF)
VIF คานวณอยางไร?
r r2 vif.1 0.01 1.01 .2 0.04 1.04 .3 0.09 1.10 .4 0.16 1.19 .5 0.25 1.33 .6 0.36 1.56 .7 0.49 1.96 .8 0.64 2.78 .9 0.81 5.26 .91 0.83 5.82 .92 0.85 6.51 .93 0.86 7.40 .94 0.88 8.59 .95 0.90 10.26.96 0.92 12.76 .97 0.94 16.92 .98 0.96 25.25 .99 0.98 50.25 1 1.00 .
ความสมพนธระหวาง VIF vs คา correlation
.95
วธ Variance inflation factors
- เพอวดวาความแปรปรวนทประมาณจากคาสมประสทธ
inflated ไปเพยงใดเมอเปรยบเทยบกบการมตวแปรอสระ
ทไมมความสมพนธเชงเสน
1-p
1-p
1i
KVIF
VIF
และ
2
iR1
11)
2
iR(1
iVIF
)2i
R(1i
tolerance
Indication of Multicollinearity ดวยวธ Variance inflation factors*
- VIF > 10 indication that Multicollinearity
- Mean VIF provides information about the severity of the
multicollinearity
- if Mean VIF > 1 are indicative of serious multicollinearity
problems
*Neter, Wasserman, Kutner (1987; p.392)
Marquardt (1970); Belsley, Kuh & Welsch (1980)
- tolerence <0.20 or 0.10 and/or VIF>5 or 10+ (O’Brien, 2007)
Stata
collin [varlist…]estat vif variance inflation factors for the
independent variables
Conditional Index & Variance Decomposition Proportion
คา Conditional Index (CI) และคา Variance Decomposition
Proportion (VDP) เปนคาทคานวณจาก eigenvalue จากการ
วเคราะหเมตรกซสหสมพนธ ของตวแปรอสระ โดย Conditional
Index คานวณจาก
คา Conditional Index มคา 10-30 แสดงวามภาวะรวมเสนตรง
คา conditional index > 30 แสดงวามปญหาภาวะรวมเสนตรง
Conditional Index > 100 แสดงวามภาวะรวมเสนตรงสงมากๆ
(Belsley, 1991a)
between 10 and 30, there is moderate to strong multicollinearity and
if it exceeds 30 there is severe multicollinearity. (Gujarati, 2002)
Eigenvaluek MinMax ;/
Conditional Index & Variance Decomposition Proportion
คา Variance Decomposition Proportion แนะนาโดย
Belsley et al. (1980) และ Belsley (1991a)
พจารณา VDP มากกวา 0.5
คานวณคาสดสวนของความแปรปรวน (proposed calculation of
the proportions of variance) ของแตละตวแปรสมพนธกบ
คาองคประกอบ (principal component) เปรยบเสมอน
องคประกอบของคาสมประสทธความแปรปรวนในแตละมต
(decomposition of the coefficient variance for each dimension)
kj
jkjk VIF
Vp
2
(Fox,1984)
23
. collin age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 age_ndrugfp1 racesiteCollinearity Diagnostics
SQRT R-Variable VIF VIF Tolerance Squared
----------------------------------------------------age 2.64 1.63 0.3782 0.6218
ndrugfp1 105.68 10.28 0.0095 0.9905ndrugfp2 63.77 7.99 0.0157 0.9843
race 1.43 1.20 0.6969 0.3031treat 1.02 1.01 0.9831 0.0169site 1.41 1.19 0.7090 0.2910
_Iivhx_2 1.39 1.18 0.7201 0.2799_Iivhx_3 1.65 1.28 0.6061 0.3939
age_ndrugfp1 27.55 5.25 0.0363 0.9637racesite 1.64 1.28 0.6109 0.3891
----------------------------------------------------Mean VIF 20.82
CondEigenval Index
---------------------------------1 5.9439 1.00002 1.2749 2.15923 1.0679 2.35924 1.0129 2.42245 0.7402 2.83386 0.4588 3.59957 0.3110 4.37168 0.1469 6.36209 0.0320 13.628910 0.0094 25.149011 0.0021 52.8408
---------------------------------Condition Number 52.8408Eigenvalues & Cond Index computed from scaled raw sscp (w/ intercept)Det(correlation matrix) 0.0002
- ตรวจสอบคา conditional index/variance decomposition proportion
- CI มากกวา 30, VDP มากกวาหรอเทากบ .5
. coldiag2 age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 age_ndrugfp1, force w(5)
Condition number using scaled variables = 52.18
Condition Indexes and Variance-Decomposition Proportions
conditionindex _cons age ndr~1 nd~p2 race treat site _Ii~2 _Ii~3 age~1
> 1 1.00 0.00 0.00 0.00 0.00 0.01 0.01 0.01 0.00 0.00 0.002 2.22 0.00 0.00 0.00 0.00 0.00 0.02 0.01 0.00 0.12 0.003 2.38 0.00 0.00 0.00 0.00 0.00 0.01 0.05 0.37 0.03 0.004 2.70 0.00 0.00 0.00 0.00 0.64 0.01 0.14 0.00 0.03 0.005 3.23 0.00 0.00 0.00 0.00 0.20 0.05 0.69 0.14 0.00 0.006 3.56 0.00 0.00 0.00 0.00 0.03 0.80 0.03 0.12 0.06 0.007 6.12 0.02 0.02 0.00 0.00 0.12 0.08 0.06 0.32 0.70 0.008 13.34 0.07 0.06 0.01 0.02 0.00 0.02 0.00 0.03 0.02 0.219 24.83 0.10 0.43 0.03 0.35 0.00 0.00 0.00 0.00 0.03 0.3010 52.18 0.81 0.49 0.96 0.62 0.00 0.00 0.00 0.00 0.01 0.49
. prnt_cx, force w(5)
Condition Indexes and Variance-Decomposition Proportions
conditionindex _cons age ndr~1 nd~p2 race treat site _Ii~2 _Ii~3 age~1
> 1 1.00 . . . . . . . . . . 2 2.22 . . . . . . . . . . 3 2.38 . . . . . . . 0.37 . . 4 2.70 . . . . 0.64 . . . . . 5 3.23 . . . . . . 0.69 . . . 6 3.56 . . . . . 0.80 . . . . 7 6.12 . . . . . . . 0.32 0.70 . 8 13.34 . . . . . . . . . . 9 24.83 . 0.43 . 0.35 . . . . . 0.3010 52.18 0.81 0.49 0.96 0.62 . . . . . 0.49
Variance-Decomposition Proportions less than .3 have been printed as "."
Zero cell or Sparse data
- Exact logistic Regression
- Firth logistic Regression
Problem of perfect or complete separation
- Firth logistic Regression
Binomial Overdispersion
- Scale SEs by Chi2
dispersion.
- Scale iteratively; Williams’ procedure.
- Robust variance estimators.
- Bootstrap or jackknife SE.
- Generalized binomial.
- Parameterize as a rate-count response model.
- Parameterize as a panel model.
- Nested logistic regression.