多变量回归分析 multiple regression.pdf

Upload: victorchung23

Post on 04-Apr-2018

258 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 Multiple Regression.pdf

    1/23

    Chapter 3

    Multiple Regression

    3.1 Multiple Linear Regression Model

    A fitted linear regression model always leaves some residual variation. There

    might be another systematic cause for the variability in the observations yi. If wehave data on other explanatory variables we can ask whether they can be used to

    explain some of the residual variation in Y. If this is a case, we should take it intoaccount in the model, so that the errors are purely random. We could write

    Yi = 0 + 1xi + 2zi + i

    previously i

    .

    Z is another explanatory variable. Usually, we denote all explanatory variables(there may be more than two of them) using letter X with an index to distinguishbetween them, i.e., X1, X2, . . . , X p1.

    Example 3.1. (Neter at al, 1996) Dwine Studios, Inc.The company operates portrait studios in 21 cities of medium size. These studios

    specialize in portraits of children. The company is considering an expansion into

    other cities of medium size and wishes to investigate whether sales (Y) in a com-munity can be predicted from the number of persons aged 16 or younger in the

    community (X1) and the per capita disposable personal income in the community(X2).

    If we use just X2 (per capita disposable personal income in the community) tomodel Y (sales in the community) we obtain the following model fit.

    57

  • 7/30/2019 Multiple Regression.pdf

    2/23

    58 CHAPTER 3. MULTIPLE REGRESSION

    The regression equation is

    Y = - 352.5 + 31.17 X2

    S = 20.3863 R-Sq = 69.9% R-Sq(adj) = 68.3%

    Analysis of Variance

    Source DF SS MS F P

    Regression 1 18299.8 18299.8 44.03 0.000

    Error 19 7896.4 415.6

    Total 20 26196.2

    (a) (b)

    Figure 3.1: (a) Fitted line plot for Dwine Studios versus per capita disposable personal incomein the community. (b) Residual plots.

    The regression is highly significant, but R2 is rather small. It suggests that there could be some

    other factors, which are also important for the sales. We have data on the number of persons aged

    16 or younger in the community, so we will examine whether the residuals of the above fit are

    related to this variable. If yes, then including it in the model may improve the fit.

    Figure 3.2: The dependence of the residuals on X1.

  • 7/30/2019 Multiple Regression.pdf

    3/23

    3.1. MULTIPLE LINEAR REGRESSION MODEL 59

    Indeed, the residuals show a possible relationship with the number of persons aged 16 or younger

    in the community. We will fit the model with both variables, X1 and X2 included, that is

    Yi = 0 + 1x1i + 1x2i + i, i = 1, . . . , n .

    The model fit is following

    The regression equation is

    Y = - 68.9 + 1.45 X1 + 9.37 X2

    Predictor Coef SE Coef T P

    Constant -68.86 60.02 -1.15 0.266

    X1 1.4546 0.2118 6.87 0.000

    X2 9.366 4.064 2.30 0.033

    S = 11.0074 R-Sq = 91.7% R-Sq(adj) = 90.7%

    Analysis of Variance

    Source DF SS MS F P

    Regression 2 24015 12008 99.10 0.000

    Residual Error 18 2181 121

    Total 20 26196

    Here we see that the intercept parameter is not significantly different from zero (p = 0.226) and

    so the model without the intercept was fitted. R2

    is now close to 100% and both parameters arehighly significant.

    Regression Equation

    Y = 1.62 X1 + 4.75 X2

    Coefficients

    Term Coef SE Coef T P

    X1 1.62175 0.154948 10.4664 0.000

    X2 4.75042 0.583246 8.1448 0.000

    S = 11.0986 R-Sq = 99.68% R-Sq(adj) = 99.64%

    Analysis of Variance

    Source DF Seq SS Adj SS Adj MS F P

    Regression 2 718732 718732 359366 2917.42 0.000

    Error 19 2340 2340 123

    Total 21 721072

  • 7/30/2019 Multiple Regression.pdf

    4/23

    60 CHAPTER 3. MULTIPLE REGRESSION

    Figure 3.3: Fitted surface plot and the Dwine Studios observations.

    A Multiple Linear Regression (MLR) model for a response variable Y and ex-planatory variables X1, X2, . . . , X p1 is

    E(Y|X1 = x1i, . . . , X p1 = xp1i) = 0 + 1x1i + . . . + p1xp1ivar(Y|X1 = x1i, . . . , X p1 = xp1i) =

    2, i = 1, . . . , n

    cov(Y|X1 = x1i,..,Xp1 = xp1i, Y|X1 = x1j,..,Xp1 = xp1j ) = 0, i = jAs in the SLR model we denote

    Yi = (Y|X1 = x1i, . . . , X p1 = xp1i)

    and we usually omit the condition on Xs and write

    i = E(Yi) = 0 + 1x1i + . . . + p1xp1i

    var(Yi) = 2, i = 1, . . . , n

    cov(Yi, Yj) = 0, i = j

    orYi = 0 + 1x1i + . . . + p1xp1i + i

    E(i) = 0

    var(i) = 2, i = 1, . . . , n

    cov(i, j ) = 0, i = j

    For testing we need the assumption of Normality, i.e., we assume that

    Yi ind

    N(i, 2)

  • 7/30/2019 Multiple Regression.pdf

    5/23

    3.2. LEAST SQUARES ESTIMATION 61

    or

    i ind N(0, 2)

    To simplify the notation we write the MLR model in a matrix form

    Y= X + , (3.1)

    that is,

    Y1Y2...

    Yn

    := Y

    =

    1 x1,1 xp1,11 x1,2 xp1,2...

    ... ...

    1 x1

    ,n xp1

    ,n

    := X

    01...

    p1

    :=

    +

    12...

    n

    :=

    Here Y is the vector of responses, X is often called the design matrix, is the

    vector of unknown, constant parameters and is the vector of random errors.

    i are independent and identically distributed, that is

    Nn(0n, 2In).

    Note that the properties of the errors give

    Y Nn(X, 2In).

    3.2 Least squares estimation

    To derive the least squares estimator (LSE) for the parameter vector we min-

    imise the sum of squares of the errors, that is

    S() =

    n

    i=1 [Yi {0 + 1x1,i + + p1xp1,i}]2

    =

    2i

    = T

    = (YX)T(YX)

    = (YT TXT)(YX)

    = YTY YTX TXTY+ TXTX

    = YTY 2TXTY+ TXTX.

  • 7/30/2019 Multiple Regression.pdf

    6/23

    62 CHAPTER 3. MULTIPLE REGRESSION

    Theorem 3.1. The LSE

    of is given by

    = (XTX)1XTYifXTX is non-singular. IfXTX is singular there is no unique LSE of.

    Proof. Let 0 be any solution ofXTX = XTY. Then, XTX0 = X

    TYand

    S() S(0)

    = YTY 2TXTY+ TXTX YTY+ 2T0XTY T0X

    TX0

    = 2TXTX0 + TXTX + 2T0X

    TX0 T0X

    TX0

    = TXTX 2TXTX0 + T0X

    TX0

    = TXTX TXTX0 TXTX0 +

    T0X

    TX0

    = TXTX TXTX0 T0X

    TX + T0XTX0

    = T(XTX XTX0) T0 (X

    TX XTX0)

    = (T T0 )(XTX XTX0)

    = (T T0 )XTX( 0)

    = {X( 0)}T{X( 0)} 0

    since it is a sum of squares of elements of the vector X( 0).

    We have shown that S() S(0) 0.

    Hence, 0 minimises S(), i.e. any solution ofXTX = XTY minimises

    S().

    IfXTX is nonsingular the unique solution is = (XTX)1XTY.IfXTX is singular there is no unique solution.

  • 7/30/2019 Multiple Regression.pdf

    7/23

    3.2. LEAST SQUARES ESTIMATION 63

    Note that, as we did for the SLM in Chapter 2, it is possible to obtain this result

    by differentiating S() with respect to and setting it equal to 0.

    3.2.1 Properties of the least squares estimator

    Theorem 3.2. If

    Y= X + , Nn(0, 2I),

    then

    Np(, 2(XTX)1).

    Proof. Each element of is a linear function of Y1, . . . , Y n. We assume thatYi, i = 1, . . . , n are normally distributed. Hence is also normally distributed.The expectation and variance-covariance matrix can be shown in the same way as

    in Theorem 2.7.

    Remark 3.1. The vector of fitted values is given by

    = Y= X= X(XTX)1XTY

    = HY.

    The matrix H= X(XTX)1XT is called the hat matrix.

    Note that

    HT = H

    and also

    HH = X(XTX)1XTX(XTX)1 =I

    XT

    = X(XTX)1XT

    = H.

    A matrix, which satisfies the condition AA = A is called an idempotent matrix.Note that ifA is idempotent, then (I A) is also idempotent.

  • 7/30/2019 Multiple Regression.pdf

    8/23

    64 CHAPTER 3. MULTIPLE REGRESSION

    We now prove some results about the residual vector

    e = Y Y= YHY

    = (I H)Y.

    As in Theorem 2.8, here we have

    Lemma 3.1. E(e) = 0.

    Proof.

    E(e) = (I H) E(Y)= (I X(XTX)1XT)X

    = X X

    = 0

    Lemma 3.2. Var(e) = 2(I H).

    Proof.

    Var(e) = (IH) var(Y)(I H)T

    = (IH)2I(I H)

    = 2(I H)

    Lemma 3.3. The sum of squares of the residuals is YT(I H)Y.

    Proof.

    n

    i=1 e2i = e

    Te = YT(I H)T(IH)Y

    = YT(I H)Y

    Lemma 3.4. The elements of the residual vectore sum to zero, i.e

    ni=1

    ei = 0.

  • 7/30/2019 Multiple Regression.pdf

    9/23

    3.3. ANALYSIS OF VARIANCE 65

    Proof. We will prove this by contradiction.

    Assume that ei = nc where c = 0. Then

    e2i =

    {(ei c) + c}2

    =

    (ei c)2 + 2c

    (ei c) + nc

    2

    =

    (ei c)2 + 2c(

    ei

    =nc

    nc) + nc2

    =

    (ei c)2 + nc2

    > (ei c)2.But we know that

    e2i is the minimum value ofS() so there cannot exist values

    with a smaller sum of squares and this gives the required contradiction. So c = 0.

    Corollary 3.1.

    1

    n

    ni=1

    Yi = Y .

    Proof. The residuals ei = Yi Yi, so ei = (Yi Yi) but ei = 0. HenceYi =

    Yi and so the result follows.

    3.3 Analysis of Variance

    We begin this section by proving the basic Analysis of Variance identity.

    Theorem 3.3. The total sum of squares splits into the regression sum of squares

    and the residual sum of squares, that is

    SST = SSR + SSE.

    Proof.

    SST =

    (Yi Y)2

    =

    Y2i nY2

    = YTY nY2.

  • 7/30/2019 Multiple Regression.pdf

    10/23

    66 CHAPTER 3. MULTIPLE REGRESSION

    SSR =

    (Yi Y)2=

    Y2i 2YYi =nY

    +nY2

    =Y2i nY2

    = YTY nY2=

    TXTX

    nY2

    = YTX(XTX)1XTX(XTX)1 =I

    XTY nY2

    = YTHY nY2.

    We have seen (Lemma 3.3) that

    SSE = YT(I H)Y

    and so

    SSR + SSE = Y

    T

    HY nY

    2

    + Y

    T

    (IH)Y= YTY nY2

    = SST

    F-test for the Overall Significance of Regression

    Suppose we wish to test the hypothesis

    H0 : 1 = 2 = . . . = p1 = 0,

    i.e. all coefficients except 0 are zero, versus

    H1 : H0,

    which means that at least one of the coefficients is non-zero. Under H0, the modelreduces to the null model

    Y= 10 + ,

  • 7/30/2019 Multiple Regression.pdf

    11/23

    3.3. ANALYSIS OF VARIANCE 67

    where 1 is a vector of ones.

    In testing H0 we are asking if there is sufficient evidence to reject the null model.

    The Analysis of variance table is given by

    Source d.f. SS MS VR

    Overall regression p 1 YTHY nY2 SSRp 1

    M SRM SE

    Residual n p YT(IH)Y SSEnp

    Total n 1 YTY nY2

    As in SLM we have n 1 total degrees of freedom. Fitting a linear model withp parameters (0, 1, . . . , p1) leaves n p residual d.f. Then the regression d.f.are n 1 (n p) = p 1.

    It can be shown that E(SSE) = (n p)2, that is MSE is an unbiased estimatorof2. Also,

    SSE2

    2np

    and if1 = . . . p1 = 0, then

    SSR2

    2p1.

    The two statistics are independent, hence

    MSRM SE

    H0

    Fp1,np.

    This is a test function for the null hypothesis

    H0 : 1 = 2 = . . . = p1 = 0,

    versus

    H1 : H0.

    We reject H0 at the 100% level of significance if

    Fobs > F;p1,np,

    where F;p1,np is such that P(F < F;p1,np) = 1 .

  • 7/30/2019 Multiple Regression.pdf

    12/23

    68 CHAPTER 3. MULTIPLE REGRESSION

    3.4 Inferences about the parameters

    In Theorem 3.2 we have seen that

    Np(, 2(XTX)1).Therefore j N(j, 2cjj ), j = 0, 1, 2, . . . , p 1,where cjj is the jth diagonal element of (X

    TX)1 (counting from 0 to p 1).Hence, it is straightforward to make inferences about j , in the usual way.

    A 100(1 )% confidence interval for j isj t2

    ,np

    S2cjj ,

    where S2 = MSE.

    The test statistic for H0 : j = 0 versus H1 : j = 0 is

    T =jS2cjj

    tnp ifH0 is true.

    Care is needed in interpreting the confidence intervals and tests. They refer only tothe model we are currently fitting. Thus not rejecting H0 : j = 0 does not meanthat Xj has no explanatory power; it means that, conditional on X1, . . . , X j1, Xj+1, . . . , X p1being in the model Xj has no additional power.

    It is often best to think of the test as comparing models without and with Xj , i.e.

    H0 : E(Yi) = 0 + 1x1,i + + j1xj1,i + j+1xj+1,i + + p1xp1,i

    versus

    H1 : E(Yi) = 0 + 1x1,i + + p1xp1,i.

    It does not tell us anything about the comparison between models E(Yi) = 0 andE(Yi) = 0 + jxj,i.

    3.5 Confidence interval for

    We haveE(Y) =

    = X.

  • 7/30/2019 Multiple Regression.pdf

    13/23

    3.5. CONFIDENCE INTERVAL FOR 69

    As with simple linear regression, we might want to estimate the expected response

    at a specific x, say x0 = (1, x1,0, . . . , xp1,0)T, i.e.

    0 = E(Y|X1 = x1,0, . . . , X p1 = xp1,0).

    The point estimate will be 0 = xT0.Assuming normality, as usual, we can obtain a confidence interval for 0.

    Theorem 3.4.

    0 N(0, 2xT0 (X

    TX)1x0).

    Proof.

    (i) 0 = xT0 is a linear combination of0,1, . . . ,p1, each of which is normal.Hence0 is also normal.

    (ii)

    E(

    0) = E(x

    T0

    )

    = xT

    0 E()= xT0

    = 0

    (iii)

    Var(0) = Var(xT0)= xT0 Var(

    )x0= 2xT0 (X

    TX)1x0.

    The following corollary is a consequence of Theorem 3.4.

    Corollary 3.2. A 100(1 )% confidence interval for0 is

    0 t2

    ,np

    S2xT0 (X

    TX)1x0.

  • 7/30/2019 Multiple Regression.pdf

    14/23

    70 CHAPTER 3. MULTIPLE REGRESSION

    3.6 Predicting a new observation

    To predict a new observation we need to take into account not only its expectation,

    but also a possible new random error.

    The point estimator of a new observation

    Y0 =

    Y|X1 = x1,0, . . . , X p1 = xp1,0

    = 0 + 0

    is

    Y0 = x

    T0

    (=

    0),

    which, assuming normality, is such that

    Y0 N(0, 2xT0 (XTX)1x0).Then Y0 0 N(0, 2xT0 (XTX)1x0)and Y0 (0 + 0) N(0, 2xT0 (XTX)1x0 + 2).That is

    Y0 Y0 N(0, 2{1 + xT0 (XTX)1x0})and hence Y0 Y0

    2{1 + xT0 (XTX)1x0}

    N(0, 1).

    As usual we estimate 2 by S2 and get

    Y0 Y0S2{1 + xT0 (X

    TX)1x0} tnp.

    Hence a 100(1 )% prediction interval for Y0 is given by

    Y0 t2

    ,np

    S2{1 + xT0 (X

    TX)1x0}.

  • 7/30/2019 Multiple Regression.pdf

    15/23

    3.7. MODEL BUILDING 71

    3.7 Model Building

    We have already mentioned the principle of parsimony; we should use the simplest

    model that achieves our purpose.

    It is easy to get a simple model (Yi = 0 + i) and it is easy to represent theresponse by the data themselves. However, the first is generally too simple and

    the second is not a useful model. Achieving a simple model that describes the

    data well is something of an art. Often, there is more than one model which does

    a reasonable job.

    Example 3.2. SalesA company is interested in the dependence of sales on promotional expenditure

    (X1 in 1000), the number of active accounts (X2), the district potential (X3coded), and the number of competing brands (X4). We will try to find a goodmultiple regression model for the response variable Y (sales).

    Data on last years sales (Y in 100,000) in 15 sales districts are given in the file

    Sales.txt on the course website.

    Figure 3.4: The Matrix Plot indicates that Yis clearly related to X4 and also to X2. The relationwith other explanatory variables is not that obvious.

  • 7/30/2019 Multiple Regression.pdf

    16/23

    72 CHAPTER 3. MULTIPLE REGRESSION

    Let us start with fitting a simple regression model of Y as a function ofX4 only.

    The regression equation is

    Y = 396 - 25.1 X4

    Predictor Coef SE Coef T P

    Constant 396.07 49.25 8.04 0.000

    X4 -25.051 5.242 -4.78 0.000

    S = 49.9868 R-Sq = 63.7% R-Sq(adj) = 60.9%

    Analysis of Variance

    Source DF SS MS F P

    Regression 1 57064 57064 22.84 0.000

    Residual Error 13 32483 2499

    Total 14 89547

    We can see that the residuals versus fitted values indicate that there may be non-constant variance

    and also the linearity of the model is questioned. We will add X2 to the model.

    The regression equation isY = 190 - 22.3 X4 + 3.57 X2

    Predictor Coef SE Coef T P

    Constant 189.83 10.13 18.74 0.000

    X4 -22.2744 0.7076 -31.48 0.000

    X2 3.5692 0.1333 26.78 0.000

    S = 6.67497 R-Sq = 99.4% R-Sq(adj) = 99.3%

    Analysis of Variance

  • 7/30/2019 Multiple Regression.pdf

    17/23

    3.7. MODEL BUILDING 73

    Source DF SS MS F P

    Regression 2 89012 44506 998.90 0.000Residual Error 12 535 45

    Total 14 89547

    Source DF Seq SS

    X4 1 57064

    X2 1 31948

    Still, there is some evidence that the standardized residuals may not have constant variance. Willthis be changed if we add X3 to the model?

    The regression equation is

    Y = 190 - 22.3 X4 + 3.56 X2 + 0.049 X3

    Predictor Coef SE Coef T P

    Constant 189.60 10.76 17.62 0.000

    X4 -22.2679 0.7408 -30.06 0.000

    X2 3.5633 0.1482 24.05 0.000

    X3 0.0491 0.4290 0.11 0.911

    S = 6.96763 R-Sq = 99.4% R-Sq(adj) = 99.2%

    Analysis of Variance

    Source DF SS MS F P

    Regression 3 89013 29671 611.17 0.000

    Residual Error 11 534 49

    Total 14 89547

    Source DF Seq SS

    X4 1 57064

    X2 1 31948

    X3 1 1

  • 7/30/2019 Multiple Regression.pdf

    18/23

    74 CHAPTER 3. MULTIPLE REGRESSION

    Not much better than before. Now, we add X1, the least related explanatory variable to Y.

    The regression equation is

    Y = 177 - 22.2 X4 + 3.54 X2 + 0.204 X3 + 2.17 X1

    Predictor Coef SE Coef T P

    Constant 177.229 8.787 20.17 0.000

    X4 -22.1583 0.5454 -40.63 0.000

    X2 3.5380 0.1092 32.41 0.000

    X3 0.2035 0.3189 0.64 0.538

    X1 2.1702 0.6737 3.22 0.009

    S = 5.11930 R-Sq = 99.7% R-Sq(adj) = 99.6%

    Analysis of Variance

    Source DF SS MS F P

    Regression 4 89285 22321 851.72 0.000

    Residual Error 10 262 26

    Total 14 89547

    Source DF Seq SS

    X4 1 57064X2 1 31948

    X3 1 1

    X1 1 272

  • 7/30/2019 Multiple Regression.pdf

    19/23

    3.7. MODEL BUILDING 75

    The residuals now do not contradict the model assumptions. We analyze the numerical output.

    Here we see that X3 may be a redundant variable as we have no evidence to reject the hypothesis

    that 3 = 0 given that all the other variables are in the model. Hence, we will fit a new model

    without X3.

    The regression equation is

    Y = 179 - 22.2 X4 + 3.56 X2 + 2.11 X1

    Predictor Coef SE Coef T P

    Constant 178.521 8.318 21.46 0 .000

    X4 -22.1880 0.5286 -41.98 0.000

    X2 3.56240 0.09945 35.82 0.000

    X1 2.1055 0.6479 3.25 0.008

    S = 4.97952 R-Sq = 99.7% R-Sq(adj) = 99.6%

    Analysis of Variance

    Source DF SS MS F P

    Regression 3 89274 29758 1200.14 0.000

    Residual Error 11 273 25

    Total 14 89547

    Source DF Seq SS

    X4 1 57064

    X2 1 31948

    X1 1 262

  • 7/30/2019 Multiple Regression.pdf

    20/23

    76 CHAPTER 3. MULTIPLE REGRESSION

    These residual plots also do not contradict the model assumptions.

    On its own variable X1 explains only 1% of the variation but once X2 and X4 are included in the

    model then X1 is significant and also seems to cure problems with normality and non-constant

    variance.

    3.7.1 F-test for the deletion of a subset of variables

    Suppose the overall regression model as tested by the Analysis of Variance table

    is significant. We know that not all of the parameters are zero, but we may stillbe able to delete several variables.

    We can carry out the Subset Testbased on the extra sum of squares principle. We

    are asking if we can reduce the set of regressors

    X1, X2, . . . , X p1

    to, say,

    X1, X2, . . . , X q1

    (renumbering if necessary) where q < p, by omitting Xq, Xq+1, . . . , X p1.

    We are interested in whether the inclusion of Xq, Xq+1, . . . , X p1 in the modelprovides a significant increase in the overall regression sum of squares or equiva-

    lently a significant decrease in residual sum of squares.

    The difference between the sums of squares is called the extra sum of squares due

    to Xq, . . . , X p1 given X1, . . . , X q1 are already in the model and is defined bythe equation

  • 7/30/2019 Multiple Regression.pdf

    21/23

    3.7. MODEL BUILDING 77

    SS(Xq, . . . , X p1|X1, . . . , X q1) = SS(X1, X2, . . . , X p1 ) SS(X1, X2, . . . , X q1 )regression SS for regression SS for

    full model reduced model

    = SS(red)E SS(full)E

    residual SS under residual SS under

    reduced model full model.

    Notation:

    LetT1 = (0, 1, . . . , q1)

    T2 = (q, q+1, . . . , p1)

    so that

    =

    12

    .

    Similarly divide X into two submatrices X1 and X2 so that X = (X1,X2),where

    X1 =

    1 x1,1 xq1,1...

    ......

    1 x1,n xq1,n

    X2 =

    xq,1 xp1,1

    ......

    xq,n xp1,n

    .

    The full model

    Y= X + = X11 + X22 +

    has

    SS(full)R = Y

    THY nY2

    = TXTY nY2SS

    (full)E = Y

    T(I H)Y= YTY TXTY.Similarly the reduced model

    Y= X11 +

    has

    SS(red)R =

    T1XT1Y nY2SS

    (red)E = Y

    TY T1XT1Y.Hence the extra sum of squares is

    SSextra = TXTY T1XT1Y.

  • 7/30/2019 Multiple Regression.pdf

    22/23

    78 CHAPTER 3. MULTIPLE REGRESSION

    To determine whether the change in sum of squares is significant, we test the

    hypothesisH0 : q = q+1 = . . . = p1 = 0

    versus

    H1 : H0

    It can be shown that, if H0 is true,

    F =SSextra/(p q)

    S2 Fpq,np

    So, we reject H0 at the level if

    F > F;pq,np

    and conclude that there is sufficient evidence that some (but not necessarily all) of

    the extra variables Xq, . . . , X p1 should be included in the model.

    The ANOVA table is given by

    Source d.f. SS MS V R

    Overall regression p 1 SS(full)RX1,..,Xq1 q 1 SS

    (red)R

    Xq,..,Xp1|X1,..,Xq1 p q SSextra SSextrapqSSextra

    (pq)M SE

    Residual n p SSE MSETotal n 1 SST

    In the ANOVA table we use the notation Xq, . . . , X p1|X1, . . . , X q1 to denotethat this is the effect of the variables Xq, . . . , X p1 given that the variables X1, . . . , X q1are already included in the model.

    Note that as F1, distribution is equivalent to t2 we have that the F-test for H0 :

    p

    1= 0, that is for the inclusion of a single variable X

    p

    1, (this is the case

    q = p 1) can also be performed by an equivalent T-test, where

    T =p1

    se(p1) tnpwhere se(p1) is the estimated standard error of p1.Also, note that we can repeatedly test individual parameters and we get the fol-

    lowing Sums of Squares and degrees of freedom

  • 7/30/2019 Multiple Regression.pdf

    23/23

    3.7. MODEL BUILDING 79

    Source of variation df SS

    Full model p 1 SSRX1 1 SS(1)X2|X1 1 SS(2|1)X3|X1, X2 1 SS(3|1, 2)...

    ...

    Xp1|X1, . . . , X p2 1 SS(p1|1, . . . , p2)Residual n p SSETotal n 1 SST

    The output depends on the order the predictors are entered into the model. The

    sequential sum of squares is the unique portion of SSR explained by a predictor,given any previously entered predictors. If you have a model with three predictors,

    X1, X2, and X3, the sequential sum of squares for X3 shows how much of theremaining variation X3 explains given that X1 and X2 are already in the model.