chapter 7 분산분석 - 서울대학교...

Download Chapter 7 분산분석 - 서울대학교 정보화본부hosting03.snu.ac.kr/~hokim/int/2014/chap7.pdf ·  · 2014-05-14motivation . 비교하고 싶은 그룹이 두 개이면 (comparisons

If you can't read please download the document

Upload: ngothu

Post on 22-May-2018

227 views

Category:

Documents


7 download

TRANSCRIPT

  • 2014/4/29

    Chapter 7 (ANalysis Of VAariance, ANOVA)

  • (analysis of variance) : (Divide total variation into several components) (contribution of particular components)

    (Aims) : (estimation & testing for the variances) (estimation & testing for the means)

    7.1 (Introduction)

  • motivation

    (comparisons of two groups) -> t-test

    (more than two groups) -> t-test . (pairwise t-tests)

    . (cumbersome & theoretically wrong -> (multiple-comparisons problems)

    . (efficiency problems due to the usage of partial data)

    (more than 3 groups using whole data) -> ANOVA ( , ) response var: conti, explanatory var: categorical

  • 7.2 (Completely Randomized Design)

    : . (complete randomization)

    (treatments)

    (total) (average)

  • (one-way analysis of variance)

    7.2.1 Glucose

    (glucose & insulin)

  • (model)

    1

    :

    ij j ijij

    j ij

    kj

    jk

    j j

    x

    =

    =

    =

    = +

    = + +

    j ij

    :

    j

    ij-th observation mean of j-th treatment group error of ij-th observation

    Grand mean

    Effect of j-th treatment group

  • 1. 2. , , ,

    (mean, variance (homogeneity), normality, independence)

    (Hypothesis of the model)

    2~ (0, )ije N independent

    :0 1 2

    :

    (All the 's are not the same)

    .H k

    H A

    = = =

    j

    j

  • Same variances & same means

    Same variances but different means

  • 2

    1 1

    22

    1 1

    ( - )..

    ..-

    nk

    ijj i

    njk

    ijj i

    jSST x x

    Tx

    N

    = =

    = =

    =

    =

    (sum of squares, total)

  • 1

    2. . ..

    1 1

    2 2. . . .. . ..

    1 1 1 1 1 1

    2 2. . ..

    1 1 1

    ( )

    ( ) 2 ( )( ) ( )

    ( ) ( )

    j

    j

    j j j

    j

    n

    i

    nk

    ij j jj i

    n n nk k k

    ij j ij j j jj i j i j i

    nk k

    ij j j jj i j

    x x x x

    x x x x x x x x

    x x n x x

    =

    = =

    = = = = = =

    = = =

    = +

    = + +

    = +

    k2

    ij ..j=1

    SST= (x -x )

  • within among groupSST SSW SSA= +

    MSAvariance ratio=MSW

    =

    -> variation . . .

    Within-group SS Among(Between)-group SS

    ->larger VR -> larger between-group SS 0 -> groups are different -> bigger group effect !

  • (Heritability Example)

  • ANOVA Table

    factor

    Within group

    Between group

    total

    Sum of squares df Mean square Variance ratio

  • SAS program * file eg7_2_1.sas ;

    data insul;

    input glu ins ;

    cards;

    1 1.53

    1 1.61

    1 3.75

    1 2.89

    1 3.26

    2 3.15

    2 3.96

    2 3.59

    2 1.89

    2 1.45

    2 1.56

    3 3.89

    3 3.68

    3 5.70

    3 5.62

    3 5.79

    3 5.33

    4 8.18

    4 5.64

    4 7.36

    4 5.33

    4 8.82

    4 5.26

    4 7.10

    5 5.86

    5 5.46

    5 5.69

    5 6.49

    5 7.81

    5 9.03

    5 7.49

    5 8.98

    ;run;

    proc means sum mean ;

    by glu;

    var ins ;

    run;

    proc anova ; class glu ; model ins=glu ; run;

  • The ANOVA Procedure Dependent Variable: ins Sum of Source DF Squares Mean Square F Value Pr > F Model 4 121.1854282 30.2963570 19.78 F glu 4 121.1854282 30.2963570 19.78

  • Multiple Comparisons () ex) significance level = for a test In general, if we want to test , then

    overall is 0.1855, not 0.05 -> inflated type I error !!

    01 1 01 01

    02 2 02 02

    0 0 0 01 02

    01

    : 0 ( ) 1

    : 0 ( ) 1

    ( ) where and

    (

    Let H p do not reject H H is true

    H p do not reject H H is true

    then p do not reject H H H H H

    p do not reject H and do not reje

    = =

    = =

    =

    = 02 02

    )

    (1- ) (1- ) (1- )

    ct H H

    = =

    1 2 3 0k = = = = =

    4

    (1 ) (1 )1 0.1855 0.8145 ( .95) .95

    k

    = =

  • Bonferroni Correction : Set individual significance

    the overall significance level is about for m multiple tests.

    m=4

    example) When we have 10 hypotheses,

    Individual p=0.05 -> multiple comparisons problem

    (too many false findings)

    Individual p=

    This is often called Bonferroni corrected p-value.

    m

    40.05(1 ) 0.95 1 0.054

    =

    0.05 0.00510

    =

  • Detecting pairwise differences

    After rejecting , which pairs have larger differences? 1. LSD (least significant difference, ) 2. Duncans new multiple range test

    Duncan

    0 1 2 5:H = = =

  • 3. Tukey HSD (honestly significance difference)

    , ,

    * *, , *

    's are the same

    : sample size of smaller cell

    k N k j

    k N k jj

    MSEHSD q nn

    MSEHSD q nn

    =

    =

  • 7.2.2 Pairwise mean-differences of glucose example

  • 7.2.6 Pairwise comparisons by Tukeys HSD test

  • (interapolation)

    0.07 : (30 24) : (27 24)

    6 0.07 3

    0.07 3 0.0356

    4.17 0.035 4.135 4.14

    2.60 2.61 5.00 6.81 7.10

    x

    x

    x

    =

    =

    = =

    =

    e.g.

    24 4.17

    30 4.10

    24 27 30

    4.17

    4.10

  • The ANOVA Procedure

    Tukey's Studentized Range (HSD) Test for ins

    NOTE: This test controls the Type I experimentwise error rate.

    Alpha 0.05

    Error Degrees of Freedom 27

    Error Mean Square 1.531755

    Critical Value of Studentized Range 4.13047

    Comparisons significant at the 0.05 level are indicated by ***.

    Difference

    glu Between Simultaneous 95%

    Comparison Means Confidence Limits

    5 - 4 0.2884 -1.5824 2.1592

    5 - 3 2.0996 0.1474 4.0518 ***

    5 - 1 4.4933 2.4325 6.5540 ***

    5 - 2 4.5013 2.5491 6.4534 ***

    4 - 5 -0.2884 -2.1592 1.5824

    4 - 3 1.8112 -0.1999 3.8223

    4 - 1 4.2049 2.0883 6.3214 ***

    4 - 2 4.2129 2.2018 6.2239 ***

    proc anova ;

    class glu ; model ins=glu ;

    means glu /Tukey ; run;

  • Comparisons significant at the 0.05 level are indicated by ***. Difference glu Between Simultaneous 95% Comparison Means Confidence Limits 5 - 4 0.2884 -1.5824 2.1592 5 - 3 2.0996 0.1474 4.0518 *** 5 - 1 4.4933 2.4325 6.5540 *** 5 - 2 4.5013 2.5491 6.4534 *** 4 - 5 -0.2884 -2.1592 1.5824 4 - 3 1.8112 -0.1999 3.8223 4 - 1 4.2049 2.0883 6.3214 *** 4 - 2 4.2129 2.2018 6.2239 *** 3 - 5 -2.0996 -4.0518 -0.1474 *** 3 - 4 -1.8112 -3.8223 0.1999 3 - 1 2.3937 0.2048 4.5825 *** 3 - 2 2.4017 0.3147 4.4886 *** 1 - 5 -4.4933 -6.5540 -2.4325 *** 1 - 4 -4.2049 -6.3214 -2.0883 *** 1 - 3 -2.3937 -4.5825 -0.2048 *** 1 - 2 0.0080 -2.1808 2.1968 2 - 5 -4.5013 -6.4534 -2.5491 *** 2 - 4 -4.2129 -6.2239 -2.2018 *** 2 - 3 -2.4017 -4.4886 -0.3147 *** 2 - 1 -0.0080 -2.1968 2.1808

  • Homework

    ( ). SAS

    Make Anova tables using the formulae (you may use

    MS Excel). Compare your results with the results from SAS

    7.2.2

    7.2.7

  • 7.3 (Randomized complete block design)

    R.A.Fisher (1925) : to compare the yields of certain species (block=land) Randomize (other factors) in a block .

    treatments

    block

    total average

    average total

  • 7.3.1

    # of days to lean how to use a dental device

    Teaching methods Age

  • 2

    0

    (model)

    ( ) ~ (0, )

    (hypothesis): 0 1,2, ,:All 0 is not true. Some 0.

    ij i j ij

    ij ij i j

    j

    A j j

    x e

    e x N

    H j kH

    = + + +

    = + +

    = =

    =

    block effect trt effect

  • 1

    2 2 2. .. . .. . . ..

    1 1 1 1 1 1( ) ( ) ( )

    : 1 ( 1) ( 1) ( 1)( 1)

    j

    n

    i

    nk n k k n

    i j ij i jj i j i j i

    x x x x x x x x

    SST SSBl SSTr SSE

    df nk n k n k

    =

    = = = = = =

    = + +

    = + +

    = + +

    k2

    ij ..j=1

    * SST= (x -x )

  • ANOVA table

    factor

    trt

    block

    error

    total

  • Homework

    7.3.4 (SAS)

  • 7.4 (Factorial Design) (reduction of response time ) = (, , )*(, ) drug level (min, med, max)*age(mid, old) (Without interaction)

    B (Factor-B, drug level)

    A Factor A-age j=1

    j=2

    j=3

    (Mid) i=1

    5

    10

    20

    (old) i=2

    10

    15

    25

    age Drug level

    age Drug level

    redu

    ctio

    n of

    re

    spon

    se ti

    me

  • (With interaction)

    B

    A -

    j=1

    j=2

    j=3

    j=2-1

    j=3-2

    (i=1)

    5

    10

    20

    5

    10

    (i=2)

    15

    10

    5

    -5

    -5

  • 2 (2 factors)

    Factor A

    Factor B

  • 7.4.2

    (time of staying home for a nurse) = ,

    (age of the nurse, disease of the patient)

    ( )

    1, , 1, , 1, ,

    (Model)

    x eijk i j ij ijk

    i a j b k n

    = + + + +

    = = =

  • 0

    0

    0

    : 0 1, ,: Not Ho 0 for some .: 0 1, ,: Not Ho 0 for some .:( ) 0 1, , 1, ,:Not Ho ( ) 0 for some , .

    SST=SSA+SSB+SSAB+SSE

    i

    A i

    j

    A i

    ij

    A ij

    H i aH iH j bH jH i a j bH i j

    = =

    = =

    = = =

    Hypotheses()

  • factor

    treatment error

    total

    > qf(0.95,3,64) [1] 2.748191

    > qf(0.95,9,64) [1] 2.029792

    > qf(0.95,15,64) [1] 1.825586

    > 1-pf(67.95,3,64)

    [1] 0 > 1-pf(27.27,9,64)

    [1] 0 > 1-pf(4.61,15,64) [1] 7.473861e-06

  • Homework

    7.4.2 (SAS)

    7.4.3 (SAS)

  • 7.5 miscellaneous ()

    Log transformation: when normal assumption is violated.

    Normality is still problematic even after the variable transformation. Sample size is too small to check normality -> Nonparametric approach

    e.g. income, concentration

  • One Way ANOVA

    Type of Sum of Squares

    * Type :sequential (if we know the relative importance of the variables)

    Type : partial without interaction terms

    **Type:partial with interactions (If we dont know the relative importance of the variables)

    Type: There are missing cells (if none, same as Type)

    * , ** : defaults

    model i ijY A = + + :

  • One Way ANOVA, mod12.sas

    /* File : mod12.sas

    To demonstrate one way ANOVA */

    filename in 'd:\intro\taillite.dat';

    data one;

    infile in;

    input id vehtype group position speedzn resptime follotme folltmec ;

    if group = 1;

    run;

    proc sort ;by vehtype ;

    proc means;

    var resptime;

    by vehtype ;

    title 'Means of Response Time by Vehicle Type';

    run;

    proc gplot ;

    plot resptime*vehtype ;

    symbol i=box;

    title 'Box Plot Response Time by Vehicle Type';

    run;

    proc anova;

    class vehtype;

    model resptime = vehtype ;

    means vehtype /tukey lines bon cldiff scheffe snk lsd ;

    title 'One way Aonva for Tail Light Study';

    title2 ;

    run;

  • Two Way ANOVA, mod13.sas

    /* File : mod13.sas

    To demonstrate Two way ANOVA */

    filename stiff 'd:\intro\dummy.dat';

    data one;

    infile stiff;

    input species $ impactor $ stiff1 stiff2 calcium magnesm ;

    run;

    proc gchart ;

    block species / group=impactor sumvar=stiff1 type=mean ;

    title 'Block Chart of Stiff1 by Impactor and Species';

    run;

    proc anova;

    class species impactor;

    model stiff1 = species impactor species*impactor ;

    means species impactor / duncan lines ;

    title 'Two way Aonva Dummy Data';

    run;

    17.1 (Introduction)motivation7.2 (Completely Randomized Design)(one-way analysis of variance) 6 7 8 9 10 11 12ANOVA TableSAS program The ANOVA ProcedureDependent Variable: ins Sum of Source DF Squares Mean Square F Value Pr > F Model 4 121.1854282 30.2963570 19.78 F glu 4 121.1854282 30.2963570 19.78