chapter 7 분산분석 - 서울대학교...
TRANSCRIPT
-
2014/4/29
Chapter 7 (ANalysis Of VAariance, ANOVA)
-
(analysis of variance) : (Divide total variation into several components) (contribution of particular components)
(Aims) : (estimation & testing for the variances) (estimation & testing for the means)
7.1 (Introduction)
-
motivation
(comparisons of two groups) -> t-test
(more than two groups) -> t-test . (pairwise t-tests)
. (cumbersome & theoretically wrong -> (multiple-comparisons problems)
. (efficiency problems due to the usage of partial data)
(more than 3 groups using whole data) -> ANOVA ( , ) response var: conti, explanatory var: categorical
-
7.2 (Completely Randomized Design)
: . (complete randomization)
(treatments)
(total) (average)
-
(one-way analysis of variance)
7.2.1 Glucose
(glucose & insulin)
-
(model)
1
:
ij j ijij
j ij
kj
jk
j j
x
=
=
=
= +
= + +
j ij
:
j
ij-th observation mean of j-th treatment group error of ij-th observation
Grand mean
Effect of j-th treatment group
-
1. 2. , , ,
(mean, variance (homogeneity), normality, independence)
(Hypothesis of the model)
2~ (0, )ije N independent
:0 1 2
:
(All the 's are not the same)
.H k
H A
= = =
j
j
-
Same variances & same means
Same variances but different means
-
2
1 1
22
1 1
( - )..
..-
nk
ijj i
njk
ijj i
jSST x x
Tx
N
= =
= =
=
=
(sum of squares, total)
-
1
2. . ..
1 1
2 2. . . .. . ..
1 1 1 1 1 1
2 2. . ..
1 1 1
( )
( ) 2 ( )( ) ( )
( ) ( )
j
j
j j j
j
n
i
nk
ij j jj i
n n nk k k
ij j ij j j jj i j i j i
nk k
ij j j jj i j
x x x x
x x x x x x x x
x x n x x
=
= =
= = = = = =
= = =
= +
= + +
= +
k2
ij ..j=1
SST= (x -x )
-
within among groupSST SSW SSA= +
MSAvariance ratio=MSW
=
-> variation . . .
Within-group SS Among(Between)-group SS
->larger VR -> larger between-group SS 0 -> groups are different -> bigger group effect !
-
(Heritability Example)
-
ANOVA Table
factor
Within group
Between group
total
Sum of squares df Mean square Variance ratio
-
SAS program * file eg7_2_1.sas ;
data insul;
input glu ins ;
cards;
1 1.53
1 1.61
1 3.75
1 2.89
1 3.26
2 3.15
2 3.96
2 3.59
2 1.89
2 1.45
2 1.56
3 3.89
3 3.68
3 5.70
3 5.62
3 5.79
3 5.33
4 8.18
4 5.64
4 7.36
4 5.33
4 8.82
4 5.26
4 7.10
5 5.86
5 5.46
5 5.69
5 6.49
5 7.81
5 9.03
5 7.49
5 8.98
;run;
proc means sum mean ;
by glu;
var ins ;
run;
proc anova ; class glu ; model ins=glu ; run;
-
The ANOVA Procedure Dependent Variable: ins Sum of Source DF Squares Mean Square F Value Pr > F Model 4 121.1854282 30.2963570 19.78 F glu 4 121.1854282 30.2963570 19.78
-
Multiple Comparisons () ex) significance level = for a test In general, if we want to test , then
overall is 0.1855, not 0.05 -> inflated type I error !!
01 1 01 01
02 2 02 02
0 0 0 01 02
01
: 0 ( ) 1
: 0 ( ) 1
( ) where and
(
Let H p do not reject H H is true
H p do not reject H H is true
then p do not reject H H H H H
p do not reject H and do not reje
= =
= =
=
= 02 02
)
(1- ) (1- ) (1- )
ct H H
= =
1 2 3 0k = = = = =
4
(1 ) (1 )1 0.1855 0.8145 ( .95) .95
k
= =
-
Bonferroni Correction : Set individual significance
the overall significance level is about for m multiple tests.
m=4
example) When we have 10 hypotheses,
Individual p=0.05 -> multiple comparisons problem
(too many false findings)
Individual p=
This is often called Bonferroni corrected p-value.
m
40.05(1 ) 0.95 1 0.054
=
0.05 0.00510
=
-
Detecting pairwise differences
After rejecting , which pairs have larger differences? 1. LSD (least significant difference, ) 2. Duncans new multiple range test
Duncan
0 1 2 5:H = = =
-
3. Tukey HSD (honestly significance difference)
, ,
* *, , *
's are the same
: sample size of smaller cell
k N k j
k N k jj
MSEHSD q nn
MSEHSD q nn
=
=
-
7.2.2 Pairwise mean-differences of glucose example
-
7.2.6 Pairwise comparisons by Tukeys HSD test
-
(interapolation)
0.07 : (30 24) : (27 24)
6 0.07 3
0.07 3 0.0356
4.17 0.035 4.135 4.14
2.60 2.61 5.00 6.81 7.10
x
x
x
=
=
= =
=
e.g.
24 4.17
30 4.10
24 27 30
4.17
4.10
-
The ANOVA Procedure
Tukey's Studentized Range (HSD) Test for ins
NOTE: This test controls the Type I experimentwise error rate.
Alpha 0.05
Error Degrees of Freedom 27
Error Mean Square 1.531755
Critical Value of Studentized Range 4.13047
Comparisons significant at the 0.05 level are indicated by ***.
Difference
glu Between Simultaneous 95%
Comparison Means Confidence Limits
5 - 4 0.2884 -1.5824 2.1592
5 - 3 2.0996 0.1474 4.0518 ***
5 - 1 4.4933 2.4325 6.5540 ***
5 - 2 4.5013 2.5491 6.4534 ***
4 - 5 -0.2884 -2.1592 1.5824
4 - 3 1.8112 -0.1999 3.8223
4 - 1 4.2049 2.0883 6.3214 ***
4 - 2 4.2129 2.2018 6.2239 ***
proc anova ;
class glu ; model ins=glu ;
means glu /Tukey ; run;
-
Comparisons significant at the 0.05 level are indicated by ***. Difference glu Between Simultaneous 95% Comparison Means Confidence Limits 5 - 4 0.2884 -1.5824 2.1592 5 - 3 2.0996 0.1474 4.0518 *** 5 - 1 4.4933 2.4325 6.5540 *** 5 - 2 4.5013 2.5491 6.4534 *** 4 - 5 -0.2884 -2.1592 1.5824 4 - 3 1.8112 -0.1999 3.8223 4 - 1 4.2049 2.0883 6.3214 *** 4 - 2 4.2129 2.2018 6.2239 *** 3 - 5 -2.0996 -4.0518 -0.1474 *** 3 - 4 -1.8112 -3.8223 0.1999 3 - 1 2.3937 0.2048 4.5825 *** 3 - 2 2.4017 0.3147 4.4886 *** 1 - 5 -4.4933 -6.5540 -2.4325 *** 1 - 4 -4.2049 -6.3214 -2.0883 *** 1 - 3 -2.3937 -4.5825 -0.2048 *** 1 - 2 0.0080 -2.1808 2.1968 2 - 5 -4.5013 -6.4534 -2.5491 *** 2 - 4 -4.2129 -6.2239 -2.2018 *** 2 - 3 -2.4017 -4.4886 -0.3147 *** 2 - 1 -0.0080 -2.1968 2.1808
-
Homework
( ). SAS
Make Anova tables using the formulae (you may use
MS Excel). Compare your results with the results from SAS
7.2.2
7.2.7
-
7.3 (Randomized complete block design)
R.A.Fisher (1925) : to compare the yields of certain species (block=land) Randomize (other factors) in a block .
treatments
block
total average
average total
-
7.3.1
# of days to lean how to use a dental device
Teaching methods Age
-
2
0
(model)
( ) ~ (0, )
(hypothesis): 0 1,2, ,:All 0 is not true. Some 0.
ij i j ij
ij ij i j
j
A j j
x e
e x N
H j kH
= + + +
= + +
= =
=
block effect trt effect
-
1
2 2 2. .. . .. . . ..
1 1 1 1 1 1( ) ( ) ( )
: 1 ( 1) ( 1) ( 1)( 1)
j
n
i
nk n k k n
i j ij i jj i j i j i
x x x x x x x x
SST SSBl SSTr SSE
df nk n k n k
=
= = = = = =
= + +
= + +
= + +
k2
ij ..j=1
* SST= (x -x )
-
ANOVA table
factor
trt
block
error
total
-
Homework
7.3.4 (SAS)
-
7.4 (Factorial Design) (reduction of response time ) = (, , )*(, ) drug level (min, med, max)*age(mid, old) (Without interaction)
B (Factor-B, drug level)
A Factor A-age j=1
j=2
j=3
(Mid) i=1
5
10
20
(old) i=2
10
15
25
age Drug level
age Drug level
redu
ctio
n of
re
spon
se ti
me
-
(With interaction)
B
A -
j=1
j=2
j=3
j=2-1
j=3-2
(i=1)
5
10
20
5
10
(i=2)
15
10
5
-5
-5
-
2 (2 factors)
Factor A
Factor B
-
7.4.2
(time of staying home for a nurse) = ,
(age of the nurse, disease of the patient)
( )
1, , 1, , 1, ,
(Model)
x eijk i j ij ijk
i a j b k n
= + + + +
= = =
-
0
0
0
: 0 1, ,: Not Ho 0 for some .: 0 1, ,: Not Ho 0 for some .:( ) 0 1, , 1, ,:Not Ho ( ) 0 for some , .
SST=SSA+SSB+SSAB+SSE
i
A i
j
A i
ij
A ij
H i aH iH j bH jH i a j bH i j
= =
= =
= = =
Hypotheses()
-
factor
treatment error
total
> qf(0.95,3,64) [1] 2.748191
> qf(0.95,9,64) [1] 2.029792
> qf(0.95,15,64) [1] 1.825586
> 1-pf(67.95,3,64)
[1] 0 > 1-pf(27.27,9,64)
[1] 0 > 1-pf(4.61,15,64) [1] 7.473861e-06
-
Homework
7.4.2 (SAS)
7.4.3 (SAS)
-
7.5 miscellaneous ()
Log transformation: when normal assumption is violated.
Normality is still problematic even after the variable transformation. Sample size is too small to check normality -> Nonparametric approach
e.g. income, concentration
-
One Way ANOVA
Type of Sum of Squares
* Type :sequential (if we know the relative importance of the variables)
Type : partial without interaction terms
**Type:partial with interactions (If we dont know the relative importance of the variables)
Type: There are missing cells (if none, same as Type)
* , ** : defaults
model i ijY A = + + :
-
One Way ANOVA, mod12.sas
/* File : mod12.sas
To demonstrate one way ANOVA */
filename in 'd:\intro\taillite.dat';
data one;
infile in;
input id vehtype group position speedzn resptime follotme folltmec ;
if group = 1;
run;
proc sort ;by vehtype ;
proc means;
var resptime;
by vehtype ;
title 'Means of Response Time by Vehicle Type';
run;
proc gplot ;
plot resptime*vehtype ;
symbol i=box;
title 'Box Plot Response Time by Vehicle Type';
run;
proc anova;
class vehtype;
model resptime = vehtype ;
means vehtype /tukey lines bon cldiff scheffe snk lsd ;
title 'One way Aonva for Tail Light Study';
title2 ;
run;
-
Two Way ANOVA, mod13.sas
/* File : mod13.sas
To demonstrate Two way ANOVA */
filename stiff 'd:\intro\dummy.dat';
data one;
infile stiff;
input species $ impactor $ stiff1 stiff2 calcium magnesm ;
run;
proc gchart ;
block species / group=impactor sumvar=stiff1 type=mean ;
title 'Block Chart of Stiff1 by Impactor and Species';
run;
proc anova;
class species impactor;
model stiff1 = species impactor species*impactor ;
means species impactor / duncan lines ;
title 'Two way Aonva Dummy Data';
run;
17.1 (Introduction)motivation7.2 (Completely Randomized Design)(one-way analysis of variance) 6 7 8 9 10 11 12ANOVA TableSAS program The ANOVA ProcedureDependent Variable: ins Sum of Source DF Squares Mean Square F Value Pr > F Model 4 121.1854282 30.2963570 19.78 F glu 4 121.1854282 30.2963570 19.78