chuong_4.pdf

13
36 Chương 6 BỘ MÔN TOÁN GVGD: Nguyễn Đình Huy Chương 4 ÁP DỤNG MS-EXCEL TRONG PHÂN TÍCH TƯƠNG QUAN VÀ HỒI QUY Phân tích tương quan Phân tích hồi quy Đơn giản Đa tham số

Upload: nguyen-pham

Post on 03-Oct-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 36 Chng 6

    B MN TON GVGD: Nguyn nh Huy

    Chng 4

    P DNG MS-EXCEL

    TRONG PHN TCH

    TNG QUAN V HI QUY

    Phn tch tng quan

    Phn tch hi quy

    n gin

    a tham s

  • 37 Chng 6

    B MN TON GVGD: Nguyn nh Huy

    A- PHN TCH TNG QUAN

    6.1 Khi nim thng k

    Hai bin s ngu nhin Y v X c th: lin quan tuyn tnh (a v b), c khuynh hng tuyn tnh

    (c) hoc khng c lin quan (d v c).

    H s tng quan Pearson:

    ,( , )

    X Y

    X X

    COV X Y

    ; 2 2

    1

    1( )

    N

    X i X

    i

    XN

    ; 2 21

    1( )

    N

    Y i Y

    i

    YN

    S phn tch tng quan (correlation) kho st khuynh hng v mc ca s lin

    quan, trong s phn tch hi quy(regrestion) xc nh s lin quan nh lng gia hai bin s

    ngu nhin Y v X. H s tng quan c th c c tnh bi biu thc:

    1

    2 2

    1 1

    ( )( )

    ( ) ( )

    n

    i i

    iXY

    n nXX YY

    i i

    i i

    X X Y YS

    RS S

    X X Y Y

    H s tng quan c dng trong vic nh gi mc lin quan:

    Gi tr |R| Mc

  • 38 Chng 6

    B MN TON GVGD: Nguyn nh Huy

    6.2.1 Nhp d liu vo bng tnh

    6.2.3 p dng Correlation

    a- Nhp ln lt n lnh Tools v lnh Data Analysic

    b- Chn phng trnh Correlation trong hp thoi Data Analysic ri nhp nt OK.

    c- Trong hp Correlation, ln lt n nh cc chi tit:

    Phm vi u vo (Input Range),

    Cch xp xp theo hng hay ct (Group By),

    Nhn d liu (Labels Fisrt Row/Column),

    Phm vi u ra (Output Range)

    Hp thoi Correlation

    Kt qu

    Cc h s tng quan: R (m/thi gian) = 0,97; R(nhit/thi gian) = 0,97 v R (m / nhit)

    = 0,95

  • 39 Chng 6

    B MN TON GVGD: Nguyn nh Huy

    B- PHN TCH HI QUY

    6.4 Khi nhim thng k

    Php phm tch hi quy tuyn tnh (liner regression) hay c p dng trong khoa hc. Th

    d, ng hi quy (regression line / line of best fit) thng dng d on v tui th hay hn

    dng ca thuc

    (L thuyt)

    (c tnh)

    Phng trnh hi quy c th c c

    tnh bng phng php bnh phng

    cc tiu (least-squares estimation).

  • 40 Chng 6

    B MN TON GVGD: Nguyn nh Huy

    C- HI QUY TUYN TNH N GIN

    6.5 Phng trnh tng qut

    | 0XY B BX

    0B Y BX

    /i i i i

    i

    X Y X Y N

    BX X

    Y: bin s ph thuc

    (dependent / reponse variable)

    X: l bin s c lp

    (independent / predictor variable)

    B0 v B l cc h s hi quy

    (regression coefficients)

    Bng ANOVA

    Ngun

    sai s

    Bc

    t do

    Tng s bnh

    phng

    Bnh phng

    trung bnh

    Gi tr

    thng k

    Hi quy 1 ' ' 2( )iSSR Y Y MSR = SSR

    MSRF

    MSE

    Sai s N 2 ' 2( )i iSSE Y Y MSE = SSE/(N-2)

    Tng cng N 1

    2( )iSST Y Y

    = SSR + SSE

    Gi tr thng k

    Gi tr R bnh phng (R square):

    SSR

    RSST

    (100R2: % ca bin i trn Y c gii thch bi X)

    lch chun (Standard Error):

    ' 21

    ( )2

    i iS Y YN

    (S phn tn ca d liu cng t th gi

    tr ca S cng gn zero)

    Trc nghim thng k

    i vi mt phng trnh hi quy, | 0XY B BX , ngha thng k ca cc h s Bi (B0

    hay B) c nh gi bng trc nghim t (phn phi Student) trong khi tnh cht thch hp ca

    phng trnh | ( )XY f X c nh gi bng trc nghim F (phn b Fischer)

    Trc nghim t

    - Gi thuyt:

  • 41 Chng 6

    B MN TON GVGD: Nguyn nh Huy

    H0: i = 0 H s hi quy khng c ngha

    H0: i 0 H s hi quy c ngha

    - Gi tr thng k:

    2

    i i

    n

    Bt

    S

    ;

    22

    2( )n

    i

    SS

    X X

    2

    n

    B

    S

    Phn b Student = N-2

    - Bin lun:

    Nu t < t (N-2) Chp nhn gi thuyt H0 .

    Trc nghim F

    - Gi thuyt:

    H0: i = 0 Phng trnh hi quy khng thch hp

    H0: i 0 Phng trnh hi quy thch hp

    - Gi tr thng k:MSR

    FMSE

    Phn b Fischer v1 = 1, v2 = N-2

    - Kt lun:

    Nu F < F (1,N-2) Chp nhn gi thuyt H0 .

  • 42 Chng 6

    B MN TON GVGD: Nguyn nh Huy

    D- HI QUY TUYN TNH A THAM S

    Trong phng trnh hi quy tuyn tnh a tham s bin s ph thuc Y c lin quan n k

    bin s c lp Xi (i = 1,2,k) thay v ch c mt nh trong hi quy tuyn tnh n gin.

    Phng trnh tng qut : 0 1| , ,..., 0 1 1 2 2

    ...kX X X k k

    Y B B X B X B X

    Phng trnh hi quy a tham s c th c trnh by di dng ma trn:

    Bng ANOVA

    Ngun

    sai s

    Bc

    t do

    Tng s bnh

    phng

    Bnh phng

    trung bnh

    Gi tr thng k

    Hi quy k SSR MSR = SSR/k MSR

    FMSE

    Sai s N k 1 SSE MSE = SSE/(N-k-1)

    Tng cng N 1 SST= SSR + SSE

    Gi tr thng k:

    Gi tr R bnh phng:

    Gi tr R2 c hiu chnh (Adjusted R Square)

    2

    ( 1)

    SSR kFR

    SST N k kF

    (R2 0,81 l kh tt)

    Gi tr R2 c hiu chnh (Adjusted R square):

    2 2

    2 2( 1) (1 )

    1 ( 1)ii

    N R k k RR R

    N k N k

    ( 2iiR s tr nn m hay khng xc nh nu R2 hay N nh)

  • 43 Chng 6

    B MN TON GVGD: Nguyn nh Huy

    lch chun:

    ( 1)

    SSES

    N k

    (S 0,30 l kh tt)

    Trc nghim thng k

    Tng t hi quy n gin, song bn cn ch :

    - Trong trc nghim t

    H0: i = 0 Cc h s hi quy khng c ngha

    H0: i 0 C t nht vi h s hi quy c ngha

    Bc t do ca gi tr t: = N k 1.

    2

    i i

    n

    Bt

    S

    ;

    22

    2( )n

    i

    SS

    X X

    - Trong trc nghim F:

    H0: i = 0 Phng trnh hi quy khng thch hp

    H0: i 0 Phng trnh hi quy thch hp vi t nht vi Bi .

    Bc t do ca gi tr F: v1 = 1; v2 = N-k-1.

    p dng MS-EXCEL

    Th d 17: Ngi ta dng ba mc nhit gm 105, 120 v 1350C kt hp vi ba

    khong thi gian l 15, 30 v 60 pht thc hin mt phn ng tng hp. Cc hiu sut ca

    phn ng (%) c trnh by trong bng sau y:

    Thi gian (pht) X1

    Nhit (0C)

    X2 Hiu sut (%)

    Y

    15 105 1.87

    30 105 2.02

    60 105 3.28

    15 120 3.05

    30 120 4.07

    60 120 5.54

    15 135 5.03

    30 135 6.45

    60 135 7.26

    Hy cho bit yu t nhit v/ hoc yu t thi gian c lin quan tuyn tnh vi hiu sut

    ca phn ng tng hp ? Nu c th iu kin nhit 1150C trong vng 50 pht th hiu sut

    phn ng s l bao nhiu?

    Nhp d liu vo bng tnh

    D liu nht thit phi c nht thit c nhp theo ct:

  • 44 Chng 6

    B MN TON GVGD: Nguyn nh Huy

    S dng Regression

    Nhn ln lt n lnh Tools v lnh Data Analysis.

    Chn chng trnh Regression trong hp thoi Data Analysis ri nhp OK.

    Trong hp thoi Regression, ln lt n nh cc chi tit:

    Phm vi ca bin s Y (Input Y Range).

    Phm vi ca bin s X (Input Y Range)

    Nhn d liu (Labels)

    Mc tin cy (Confidence Level)

    Ta u ra (Output Range)

    V mt s ty chn khc nh ng hi quy (Line Fit Plots), biu thc sai s

    (Residuals Plots)

    Hp thoi Regression

  • 45 Chng 6

    B MN TON GVGD: Nguyn nh Huy

    Phng trnh hi quy 1| 1

    ( )XY f X

    1| 1

    2,73 0,04XY X

    (R2 = 0,21; S=1,81)

    Regression Statistics

    Multiple R 0.462512069

    R Square 0.213917414

    Adjusted R

    Square 0.101619901

    Standard Error 1.811191587

    Observations 9

    ANOVA

    df SS MS F Significance F

    Regression 1 6.24891746 6.24891746 1.904917 0.209994918

    Residual 7 22.96290476 3.280414966

    Total 8 29.21182222

    Coefficients

    Standard

    Error t Stat P-value Lower 95%

    Intercept 2.726666667 1.280705853 2.129034282 0.070771 -0.301721453

    X1 0.044539683 0.032270754 1.38018722 0.209995 -0.031768525

    t0 = 2,19 < t0,05 = 2,365 (Hay 2 0,071 0,05VP )

    Chp nhn gi thuyt H0.

    t1 = 1,38 < t0,05 = 2,365 (Hay 0,209 0,05VP )

    Chp nhn gi thuyt H0.

    30,051,905 5,590F F (Hay 4 0,209 0,05SF )

    Chp nhn gi thuyt H0.

    Vy c hai h s 2,37(B0) v 0,04(B1) ca phng trnh hi quy | 2,73 0,04iX iY X u

    khng c ngha thng k. Ni mt cch khc, phng trnh hi quy ny khng thch hp.

    Kt lun: Yu t thi gian khng c lin quan tuyn tnh vi hiu sut ca phn ng tng hp.

    Phng trnh hi quy 2 2

    ( )XY f X

    2| 2

    2,73 0,04XY X

    (R2 = 0,76; S=0,99)

  • 46 Chng 6

    B MN TON GVGD: Nguyn nh Huy

    Regression Statistics

    Multiple R 0.873933544

    R Square 0.76375984

    Adjusted R

    Square 0.730011246

    Standard Error 0.99290379

    Observations 9

    ANOVA

    df SS MS F Significance F

    Regression 1 22.31081667 22.31082 22.63086 0.002066188

    Residual 7 6.901005556 0.985858

    Total 8 29.21182222

    Coefficients Standard Error t Stat P-value Lower 95%

    Intercept -11.14111111 3.25965608 -3.41788 0.011168 -18.84897293

    X2 0.128555556 0.027023418 4.757191 0.002066 0.064655325

    t0 = 3,418 < t0,05 = 2,365 (Hay 0,011 0,05VP )

    Bc b gi thuyt H0.

    t2 = 4,757 < t0,05 = 2,365 (Hay 0,00206 0,05VP )

    Bc b gi thuyt H0.

    0,0522,631 5,590F F (Hay 0,00206 0,05SF )

    Bc b gi thuyt H0.

    Vy c hai h s -11,14(B0) v 0,13(B2) ca phng trnh hi quy 2| 2

    11,14 0,13XY X

    u c ngha thng k. Ni mt cch khc, phng trnh hi quy ny thch hp.

    Kt lun: Yu t nhit c lin quan tuyn tnh vi hiu sut ca phn ng tng hp.

    Phng trnh hi quy 1 2| , 1 2

    ( , )X XY f X X

    1 2| , 1 2

    12,70 0,04 0,13X XY X X

    (R2 = 0,97; S=0,33)

  • 47 Chng 6

    B MN TON GVGD: Nguyn nh Huy

    Regression Statistics

    Multiple R 0.988775634

    R Square 0.977677254

    Adjusted R

    Square 0.970236338

    Standard Error 0.329668544

    Observations 9

    ANOVA

    df SS MS F Significance F

    Regression 2 28.55973413 14.27987 131.3921 1.11235E-05

    Residual 6 0.652088095 0.108681

    Total 8 29.21182222

    Coefficients Standard Error t Stat P-value Lower 95%

    Intercept -12.7 1.101638961 -11.5283 2.56E-05 -15.39561342

    X1 0.044539683 0.005873842 7.582718 0.000274 0.03016691

    X2 0.128555556 0.008972441 14.32782 7.23E-06 0.106600783

    t0 = 11,528 > t0,05 = 2,365 (Hay 52,260.10 0,05VP )

    Bc b gi thuyt H0.

    t1 = 7,583 > t0,05 = 2,365 (Hay 0,00027 0,05VP )

    Bc b gi thuyt H0.

    t2 = 14,328 > t0,05 = 2,365 (Hay 67,233.10 0,05VP )

    Bc b gi thuyt H0.

    0,05131,392 5,140F F (Hay 51,112.10 0,05SF )

    Bc b gi thuyt H0.

    Vy c hai h s -12,70(B0), 0,04(B1) v 0,13(B2) ca phng trnh hi quy

    1 2| , 1 212,70 0,04 0,13X XY X X u c ngha thng k. Ni mt cch khc, phng trnh

    hi quy ny thch hp.

    Kt lun: Hiu sut ca phn ng tng hp c lin quan tuyn tnh vi c hai yu t l thi

    gian v nhit .

    S tuyn tnh ca phng trnh 1 2| , 1 2

    12,70 0,04 0,13X XY X X c th c trnh by

    trn biu phn tn (scatterplots):

  • 48 Chng 6

    B MN TON GVGD: Nguyn nh Huy

    Mun d on hiu sut ca phn ng bng phng trnh hi quy

    1 2| , 1 212,70 0,04 0,13X XY X X , bn ch cn chn mt , th d B21, sau nhp hm v

    c kt qu nh sau:

    B21 = B17 + B18 * 50 + B19 * 115

    A B C D

    17 Interrcept -12,7 1,101638961 -11,52827782

    18 X1 0,044539683 0,005873842 7.582717626

    19 X2 0,128555556 0,008972441 14,32782351

    20

    21 D on 4,310873016

    Ghi ch: B17 ta ca B0, B18 ta ca B1, B19 ta ca B2, 50 l gi tr ca X1(thi gian)

    v 115 l gi tr ca X2(nhit ).

    PH LC:

    Bng gi tr ti hn dng trong trc nghim loi gi tr bt thng:

    Gi tr thng k

    G1

    S trng hp kho

    st

    N

    Tr s ti hn

    GP (P=0,01)

    N=37 3 0,976

    2 11

    1N

    Y YG

    Y Y

    4 0,846

    5 0,729

    6 0,644

    7 0,586

    N=813 8 0,780

    3 12

    1 1N

    Y YG

    Y Y

    9 0,725

    10 0,678

    11 0,638

    12 0,605

    13 0,578

    N=1424 14 0,602

    3 13

    2 1N

    Y YG

    Y Y

    15 0,579

    16 0,559

    17 0,542

    18 0,527

    19 0,514

    20 0,502

    21 0,491

    22 0,481

    23 0,472

    24 0,464