多变量回归分析 multiple regression.pdf
TRANSCRIPT
-
7/30/2019 Multiple Regression.pdf
1/23
Chapter 3
Multiple Regression
3.1 Multiple Linear Regression Model
A fitted linear regression model always leaves some residual variation. There
might be another systematic cause for the variability in the observations yi. If wehave data on other explanatory variables we can ask whether they can be used to
explain some of the residual variation in Y. If this is a case, we should take it intoaccount in the model, so that the errors are purely random. We could write
Yi = 0 + 1xi + 2zi + i
previously i
.
Z is another explanatory variable. Usually, we denote all explanatory variables(there may be more than two of them) using letter X with an index to distinguishbetween them, i.e., X1, X2, . . . , X p1.
Example 3.1. (Neter at al, 1996) Dwine Studios, Inc.The company operates portrait studios in 21 cities of medium size. These studios
specialize in portraits of children. The company is considering an expansion into
other cities of medium size and wishes to investigate whether sales (Y) in a com-munity can be predicted from the number of persons aged 16 or younger in the
community (X1) and the per capita disposable personal income in the community(X2).
If we use just X2 (per capita disposable personal income in the community) tomodel Y (sales in the community) we obtain the following model fit.
57
-
7/30/2019 Multiple Regression.pdf
2/23
58 CHAPTER 3. MULTIPLE REGRESSION
The regression equation is
Y = - 352.5 + 31.17 X2
S = 20.3863 R-Sq = 69.9% R-Sq(adj) = 68.3%
Analysis of Variance
Source DF SS MS F P
Regression 1 18299.8 18299.8 44.03 0.000
Error 19 7896.4 415.6
Total 20 26196.2
(a) (b)
Figure 3.1: (a) Fitted line plot for Dwine Studios versus per capita disposable personal incomein the community. (b) Residual plots.
The regression is highly significant, but R2 is rather small. It suggests that there could be some
other factors, which are also important for the sales. We have data on the number of persons aged
16 or younger in the community, so we will examine whether the residuals of the above fit are
related to this variable. If yes, then including it in the model may improve the fit.
Figure 3.2: The dependence of the residuals on X1.
-
7/30/2019 Multiple Regression.pdf
3/23
3.1. MULTIPLE LINEAR REGRESSION MODEL 59
Indeed, the residuals show a possible relationship with the number of persons aged 16 or younger
in the community. We will fit the model with both variables, X1 and X2 included, that is
Yi = 0 + 1x1i + 1x2i + i, i = 1, . . . , n .
The model fit is following
The regression equation is
Y = - 68.9 + 1.45 X1 + 9.37 X2
Predictor Coef SE Coef T P
Constant -68.86 60.02 -1.15 0.266
X1 1.4546 0.2118 6.87 0.000
X2 9.366 4.064 2.30 0.033
S = 11.0074 R-Sq = 91.7% R-Sq(adj) = 90.7%
Analysis of Variance
Source DF SS MS F P
Regression 2 24015 12008 99.10 0.000
Residual Error 18 2181 121
Total 20 26196
Here we see that the intercept parameter is not significantly different from zero (p = 0.226) and
so the model without the intercept was fitted. R2
is now close to 100% and both parameters arehighly significant.
Regression Equation
Y = 1.62 X1 + 4.75 X2
Coefficients
Term Coef SE Coef T P
X1 1.62175 0.154948 10.4664 0.000
X2 4.75042 0.583246 8.1448 0.000
S = 11.0986 R-Sq = 99.68% R-Sq(adj) = 99.64%
Analysis of Variance
Source DF Seq SS Adj SS Adj MS F P
Regression 2 718732 718732 359366 2917.42 0.000
Error 19 2340 2340 123
Total 21 721072
-
7/30/2019 Multiple Regression.pdf
4/23
60 CHAPTER 3. MULTIPLE REGRESSION
Figure 3.3: Fitted surface plot and the Dwine Studios observations.
A Multiple Linear Regression (MLR) model for a response variable Y and ex-planatory variables X1, X2, . . . , X p1 is
E(Y|X1 = x1i, . . . , X p1 = xp1i) = 0 + 1x1i + . . . + p1xp1ivar(Y|X1 = x1i, . . . , X p1 = xp1i) =
2, i = 1, . . . , n
cov(Y|X1 = x1i,..,Xp1 = xp1i, Y|X1 = x1j,..,Xp1 = xp1j ) = 0, i = jAs in the SLR model we denote
Yi = (Y|X1 = x1i, . . . , X p1 = xp1i)
and we usually omit the condition on Xs and write
i = E(Yi) = 0 + 1x1i + . . . + p1xp1i
var(Yi) = 2, i = 1, . . . , n
cov(Yi, Yj) = 0, i = j
orYi = 0 + 1x1i + . . . + p1xp1i + i
E(i) = 0
var(i) = 2, i = 1, . . . , n
cov(i, j ) = 0, i = j
For testing we need the assumption of Normality, i.e., we assume that
Yi ind
N(i, 2)
-
7/30/2019 Multiple Regression.pdf
5/23
3.2. LEAST SQUARES ESTIMATION 61
or
i ind N(0, 2)
To simplify the notation we write the MLR model in a matrix form
Y= X + , (3.1)
that is,
Y1Y2...
Yn
:= Y
=
1 x1,1 xp1,11 x1,2 xp1,2...
... ...
1 x1
,n xp1
,n
:= X
01...
p1
:=
+
12...
n
:=
Here Y is the vector of responses, X is often called the design matrix, is the
vector of unknown, constant parameters and is the vector of random errors.
i are independent and identically distributed, that is
Nn(0n, 2In).
Note that the properties of the errors give
Y Nn(X, 2In).
3.2 Least squares estimation
To derive the least squares estimator (LSE) for the parameter vector we min-
imise the sum of squares of the errors, that is
S() =
n
i=1 [Yi {0 + 1x1,i + + p1xp1,i}]2
=
2i
= T
= (YX)T(YX)
= (YT TXT)(YX)
= YTY YTX TXTY+ TXTX
= YTY 2TXTY+ TXTX.
-
7/30/2019 Multiple Regression.pdf
6/23
62 CHAPTER 3. MULTIPLE REGRESSION
Theorem 3.1. The LSE
of is given by
= (XTX)1XTYifXTX is non-singular. IfXTX is singular there is no unique LSE of.
Proof. Let 0 be any solution ofXTX = XTY. Then, XTX0 = X
TYand
S() S(0)
= YTY 2TXTY+ TXTX YTY+ 2T0XTY T0X
TX0
= 2TXTX0 + TXTX + 2T0X
TX0 T0X
TX0
= TXTX 2TXTX0 + T0X
TX0
= TXTX TXTX0 TXTX0 +
T0X
TX0
= TXTX TXTX0 T0X
TX + T0XTX0
= T(XTX XTX0) T0 (X
TX XTX0)
= (T T0 )(XTX XTX0)
= (T T0 )XTX( 0)
= {X( 0)}T{X( 0)} 0
since it is a sum of squares of elements of the vector X( 0).
We have shown that S() S(0) 0.
Hence, 0 minimises S(), i.e. any solution ofXTX = XTY minimises
S().
IfXTX is nonsingular the unique solution is = (XTX)1XTY.IfXTX is singular there is no unique solution.
-
7/30/2019 Multiple Regression.pdf
7/23
3.2. LEAST SQUARES ESTIMATION 63
Note that, as we did for the SLM in Chapter 2, it is possible to obtain this result
by differentiating S() with respect to and setting it equal to 0.
3.2.1 Properties of the least squares estimator
Theorem 3.2. If
Y= X + , Nn(0, 2I),
then
Np(, 2(XTX)1).
Proof. Each element of is a linear function of Y1, . . . , Y n. We assume thatYi, i = 1, . . . , n are normally distributed. Hence is also normally distributed.The expectation and variance-covariance matrix can be shown in the same way as
in Theorem 2.7.
Remark 3.1. The vector of fitted values is given by
= Y= X= X(XTX)1XTY
= HY.
The matrix H= X(XTX)1XT is called the hat matrix.
Note that
HT = H
and also
HH = X(XTX)1XTX(XTX)1 =I
XT
= X(XTX)1XT
= H.
A matrix, which satisfies the condition AA = A is called an idempotent matrix.Note that ifA is idempotent, then (I A) is also idempotent.
-
7/30/2019 Multiple Regression.pdf
8/23
64 CHAPTER 3. MULTIPLE REGRESSION
We now prove some results about the residual vector
e = Y Y= YHY
= (I H)Y.
As in Theorem 2.8, here we have
Lemma 3.1. E(e) = 0.
Proof.
E(e) = (I H) E(Y)= (I X(XTX)1XT)X
= X X
= 0
Lemma 3.2. Var(e) = 2(I H).
Proof.
Var(e) = (IH) var(Y)(I H)T
= (IH)2I(I H)
= 2(I H)
Lemma 3.3. The sum of squares of the residuals is YT(I H)Y.
Proof.
n
i=1 e2i = e
Te = YT(I H)T(IH)Y
= YT(I H)Y
Lemma 3.4. The elements of the residual vectore sum to zero, i.e
ni=1
ei = 0.
-
7/30/2019 Multiple Regression.pdf
9/23
3.3. ANALYSIS OF VARIANCE 65
Proof. We will prove this by contradiction.
Assume that ei = nc where c = 0. Then
e2i =
{(ei c) + c}2
=
(ei c)2 + 2c
(ei c) + nc
2
=
(ei c)2 + 2c(
ei
=nc
nc) + nc2
=
(ei c)2 + nc2
> (ei c)2.But we know that
e2i is the minimum value ofS() so there cannot exist values
with a smaller sum of squares and this gives the required contradiction. So c = 0.
Corollary 3.1.
1
n
ni=1
Yi = Y .
Proof. The residuals ei = Yi Yi, so ei = (Yi Yi) but ei = 0. HenceYi =
Yi and so the result follows.
3.3 Analysis of Variance
We begin this section by proving the basic Analysis of Variance identity.
Theorem 3.3. The total sum of squares splits into the regression sum of squares
and the residual sum of squares, that is
SST = SSR + SSE.
Proof.
SST =
(Yi Y)2
=
Y2i nY2
= YTY nY2.
-
7/30/2019 Multiple Regression.pdf
10/23
66 CHAPTER 3. MULTIPLE REGRESSION
SSR =
(Yi Y)2=
Y2i 2YYi =nY
+nY2
=Y2i nY2
= YTY nY2=
TXTX
nY2
= YTX(XTX)1XTX(XTX)1 =I
XTY nY2
= YTHY nY2.
We have seen (Lemma 3.3) that
SSE = YT(I H)Y
and so
SSR + SSE = Y
T
HY nY
2
+ Y
T
(IH)Y= YTY nY2
= SST
F-test for the Overall Significance of Regression
Suppose we wish to test the hypothesis
H0 : 1 = 2 = . . . = p1 = 0,
i.e. all coefficients except 0 are zero, versus
H1 : H0,
which means that at least one of the coefficients is non-zero. Under H0, the modelreduces to the null model
Y= 10 + ,
-
7/30/2019 Multiple Regression.pdf
11/23
3.3. ANALYSIS OF VARIANCE 67
where 1 is a vector of ones.
In testing H0 we are asking if there is sufficient evidence to reject the null model.
The Analysis of variance table is given by
Source d.f. SS MS VR
Overall regression p 1 YTHY nY2 SSRp 1
M SRM SE
Residual n p YT(IH)Y SSEnp
Total n 1 YTY nY2
As in SLM we have n 1 total degrees of freedom. Fitting a linear model withp parameters (0, 1, . . . , p1) leaves n p residual d.f. Then the regression d.f.are n 1 (n p) = p 1.
It can be shown that E(SSE) = (n p)2, that is MSE is an unbiased estimatorof2. Also,
SSE2
2np
and if1 = . . . p1 = 0, then
SSR2
2p1.
The two statistics are independent, hence
MSRM SE
H0
Fp1,np.
This is a test function for the null hypothesis
H0 : 1 = 2 = . . . = p1 = 0,
versus
H1 : H0.
We reject H0 at the 100% level of significance if
Fobs > F;p1,np,
where F;p1,np is such that P(F < F;p1,np) = 1 .
-
7/30/2019 Multiple Regression.pdf
12/23
68 CHAPTER 3. MULTIPLE REGRESSION
3.4 Inferences about the parameters
In Theorem 3.2 we have seen that
Np(, 2(XTX)1).Therefore j N(j, 2cjj ), j = 0, 1, 2, . . . , p 1,where cjj is the jth diagonal element of (X
TX)1 (counting from 0 to p 1).Hence, it is straightforward to make inferences about j , in the usual way.
A 100(1 )% confidence interval for j isj t2
,np
S2cjj ,
where S2 = MSE.
The test statistic for H0 : j = 0 versus H1 : j = 0 is
T =jS2cjj
tnp ifH0 is true.
Care is needed in interpreting the confidence intervals and tests. They refer only tothe model we are currently fitting. Thus not rejecting H0 : j = 0 does not meanthat Xj has no explanatory power; it means that, conditional on X1, . . . , X j1, Xj+1, . . . , X p1being in the model Xj has no additional power.
It is often best to think of the test as comparing models without and with Xj , i.e.
H0 : E(Yi) = 0 + 1x1,i + + j1xj1,i + j+1xj+1,i + + p1xp1,i
versus
H1 : E(Yi) = 0 + 1x1,i + + p1xp1,i.
It does not tell us anything about the comparison between models E(Yi) = 0 andE(Yi) = 0 + jxj,i.
3.5 Confidence interval for
We haveE(Y) =
= X.
-
7/30/2019 Multiple Regression.pdf
13/23
3.5. CONFIDENCE INTERVAL FOR 69
As with simple linear regression, we might want to estimate the expected response
at a specific x, say x0 = (1, x1,0, . . . , xp1,0)T, i.e.
0 = E(Y|X1 = x1,0, . . . , X p1 = xp1,0).
The point estimate will be 0 = xT0.Assuming normality, as usual, we can obtain a confidence interval for 0.
Theorem 3.4.
0 N(0, 2xT0 (X
TX)1x0).
Proof.
(i) 0 = xT0 is a linear combination of0,1, . . . ,p1, each of which is normal.Hence0 is also normal.
(ii)
E(
0) = E(x
T0
)
= xT
0 E()= xT0
= 0
(iii)
Var(0) = Var(xT0)= xT0 Var(
)x0= 2xT0 (X
TX)1x0.
The following corollary is a consequence of Theorem 3.4.
Corollary 3.2. A 100(1 )% confidence interval for0 is
0 t2
,np
S2xT0 (X
TX)1x0.
-
7/30/2019 Multiple Regression.pdf
14/23
70 CHAPTER 3. MULTIPLE REGRESSION
3.6 Predicting a new observation
To predict a new observation we need to take into account not only its expectation,
but also a possible new random error.
The point estimator of a new observation
Y0 =
Y|X1 = x1,0, . . . , X p1 = xp1,0
= 0 + 0
is
Y0 = x
T0
(=
0),
which, assuming normality, is such that
Y0 N(0, 2xT0 (XTX)1x0).Then Y0 0 N(0, 2xT0 (XTX)1x0)and Y0 (0 + 0) N(0, 2xT0 (XTX)1x0 + 2).That is
Y0 Y0 N(0, 2{1 + xT0 (XTX)1x0})and hence Y0 Y0
2{1 + xT0 (XTX)1x0}
N(0, 1).
As usual we estimate 2 by S2 and get
Y0 Y0S2{1 + xT0 (X
TX)1x0} tnp.
Hence a 100(1 )% prediction interval for Y0 is given by
Y0 t2
,np
S2{1 + xT0 (X
TX)1x0}.
-
7/30/2019 Multiple Regression.pdf
15/23
3.7. MODEL BUILDING 71
3.7 Model Building
We have already mentioned the principle of parsimony; we should use the simplest
model that achieves our purpose.
It is easy to get a simple model (Yi = 0 + i) and it is easy to represent theresponse by the data themselves. However, the first is generally too simple and
the second is not a useful model. Achieving a simple model that describes the
data well is something of an art. Often, there is more than one model which does
a reasonable job.
Example 3.2. SalesA company is interested in the dependence of sales on promotional expenditure
(X1 in 1000), the number of active accounts (X2), the district potential (X3coded), and the number of competing brands (X4). We will try to find a goodmultiple regression model for the response variable Y (sales).
Data on last years sales (Y in 100,000) in 15 sales districts are given in the file
Sales.txt on the course website.
Figure 3.4: The Matrix Plot indicates that Yis clearly related to X4 and also to X2. The relationwith other explanatory variables is not that obvious.
-
7/30/2019 Multiple Regression.pdf
16/23
72 CHAPTER 3. MULTIPLE REGRESSION
Let us start with fitting a simple regression model of Y as a function ofX4 only.
The regression equation is
Y = 396 - 25.1 X4
Predictor Coef SE Coef T P
Constant 396.07 49.25 8.04 0.000
X4 -25.051 5.242 -4.78 0.000
S = 49.9868 R-Sq = 63.7% R-Sq(adj) = 60.9%
Analysis of Variance
Source DF SS MS F P
Regression 1 57064 57064 22.84 0.000
Residual Error 13 32483 2499
Total 14 89547
We can see that the residuals versus fitted values indicate that there may be non-constant variance
and also the linearity of the model is questioned. We will add X2 to the model.
The regression equation isY = 190 - 22.3 X4 + 3.57 X2
Predictor Coef SE Coef T P
Constant 189.83 10.13 18.74 0.000
X4 -22.2744 0.7076 -31.48 0.000
X2 3.5692 0.1333 26.78 0.000
S = 6.67497 R-Sq = 99.4% R-Sq(adj) = 99.3%
Analysis of Variance
-
7/30/2019 Multiple Regression.pdf
17/23
3.7. MODEL BUILDING 73
Source DF SS MS F P
Regression 2 89012 44506 998.90 0.000Residual Error 12 535 45
Total 14 89547
Source DF Seq SS
X4 1 57064
X2 1 31948
Still, there is some evidence that the standardized residuals may not have constant variance. Willthis be changed if we add X3 to the model?
The regression equation is
Y = 190 - 22.3 X4 + 3.56 X2 + 0.049 X3
Predictor Coef SE Coef T P
Constant 189.60 10.76 17.62 0.000
X4 -22.2679 0.7408 -30.06 0.000
X2 3.5633 0.1482 24.05 0.000
X3 0.0491 0.4290 0.11 0.911
S = 6.96763 R-Sq = 99.4% R-Sq(adj) = 99.2%
Analysis of Variance
Source DF SS MS F P
Regression 3 89013 29671 611.17 0.000
Residual Error 11 534 49
Total 14 89547
Source DF Seq SS
X4 1 57064
X2 1 31948
X3 1 1
-
7/30/2019 Multiple Regression.pdf
18/23
74 CHAPTER 3. MULTIPLE REGRESSION
Not much better than before. Now, we add X1, the least related explanatory variable to Y.
The regression equation is
Y = 177 - 22.2 X4 + 3.54 X2 + 0.204 X3 + 2.17 X1
Predictor Coef SE Coef T P
Constant 177.229 8.787 20.17 0.000
X4 -22.1583 0.5454 -40.63 0.000
X2 3.5380 0.1092 32.41 0.000
X3 0.2035 0.3189 0.64 0.538
X1 2.1702 0.6737 3.22 0.009
S = 5.11930 R-Sq = 99.7% R-Sq(adj) = 99.6%
Analysis of Variance
Source DF SS MS F P
Regression 4 89285 22321 851.72 0.000
Residual Error 10 262 26
Total 14 89547
Source DF Seq SS
X4 1 57064X2 1 31948
X3 1 1
X1 1 272
-
7/30/2019 Multiple Regression.pdf
19/23
3.7. MODEL BUILDING 75
The residuals now do not contradict the model assumptions. We analyze the numerical output.
Here we see that X3 may be a redundant variable as we have no evidence to reject the hypothesis
that 3 = 0 given that all the other variables are in the model. Hence, we will fit a new model
without X3.
The regression equation is
Y = 179 - 22.2 X4 + 3.56 X2 + 2.11 X1
Predictor Coef SE Coef T P
Constant 178.521 8.318 21.46 0 .000
X4 -22.1880 0.5286 -41.98 0.000
X2 3.56240 0.09945 35.82 0.000
X1 2.1055 0.6479 3.25 0.008
S = 4.97952 R-Sq = 99.7% R-Sq(adj) = 99.6%
Analysis of Variance
Source DF SS MS F P
Regression 3 89274 29758 1200.14 0.000
Residual Error 11 273 25
Total 14 89547
Source DF Seq SS
X4 1 57064
X2 1 31948
X1 1 262
-
7/30/2019 Multiple Regression.pdf
20/23
76 CHAPTER 3. MULTIPLE REGRESSION
These residual plots also do not contradict the model assumptions.
On its own variable X1 explains only 1% of the variation but once X2 and X4 are included in the
model then X1 is significant and also seems to cure problems with normality and non-constant
variance.
3.7.1 F-test for the deletion of a subset of variables
Suppose the overall regression model as tested by the Analysis of Variance table
is significant. We know that not all of the parameters are zero, but we may stillbe able to delete several variables.
We can carry out the Subset Testbased on the extra sum of squares principle. We
are asking if we can reduce the set of regressors
X1, X2, . . . , X p1
to, say,
X1, X2, . . . , X q1
(renumbering if necessary) where q < p, by omitting Xq, Xq+1, . . . , X p1.
We are interested in whether the inclusion of Xq, Xq+1, . . . , X p1 in the modelprovides a significant increase in the overall regression sum of squares or equiva-
lently a significant decrease in residual sum of squares.
The difference between the sums of squares is called the extra sum of squares due
to Xq, . . . , X p1 given X1, . . . , X q1 are already in the model and is defined bythe equation
-
7/30/2019 Multiple Regression.pdf
21/23
3.7. MODEL BUILDING 77
SS(Xq, . . . , X p1|X1, . . . , X q1) = SS(X1, X2, . . . , X p1 ) SS(X1, X2, . . . , X q1 )regression SS for regression SS for
full model reduced model
= SS(red)E SS(full)E
residual SS under residual SS under
reduced model full model.
Notation:
LetT1 = (0, 1, . . . , q1)
T2 = (q, q+1, . . . , p1)
so that
=
12
.
Similarly divide X into two submatrices X1 and X2 so that X = (X1,X2),where
X1 =
1 x1,1 xq1,1...
......
1 x1,n xq1,n
X2 =
xq,1 xp1,1
......
xq,n xp1,n
.
The full model
Y= X + = X11 + X22 +
has
SS(full)R = Y
THY nY2
= TXTY nY2SS
(full)E = Y
T(I H)Y= YTY TXTY.Similarly the reduced model
Y= X11 +
has
SS(red)R =
T1XT1Y nY2SS
(red)E = Y
TY T1XT1Y.Hence the extra sum of squares is
SSextra = TXTY T1XT1Y.
-
7/30/2019 Multiple Regression.pdf
22/23
78 CHAPTER 3. MULTIPLE REGRESSION
To determine whether the change in sum of squares is significant, we test the
hypothesisH0 : q = q+1 = . . . = p1 = 0
versus
H1 : H0
It can be shown that, if H0 is true,
F =SSextra/(p q)
S2 Fpq,np
So, we reject H0 at the level if
F > F;pq,np
and conclude that there is sufficient evidence that some (but not necessarily all) of
the extra variables Xq, . . . , X p1 should be included in the model.
The ANOVA table is given by
Source d.f. SS MS V R
Overall regression p 1 SS(full)RX1,..,Xq1 q 1 SS
(red)R
Xq,..,Xp1|X1,..,Xq1 p q SSextra SSextrapqSSextra
(pq)M SE
Residual n p SSE MSETotal n 1 SST
In the ANOVA table we use the notation Xq, . . . , X p1|X1, . . . , X q1 to denotethat this is the effect of the variables Xq, . . . , X p1 given that the variables X1, . . . , X q1are already included in the model.
Note that as F1, distribution is equivalent to t2 we have that the F-test for H0 :
p
1= 0, that is for the inclusion of a single variable X
p
1, (this is the case
q = p 1) can also be performed by an equivalent T-test, where
T =p1
se(p1) tnpwhere se(p1) is the estimated standard error of p1.Also, note that we can repeatedly test individual parameters and we get the fol-
lowing Sums of Squares and degrees of freedom
-
7/30/2019 Multiple Regression.pdf
23/23
3.7. MODEL BUILDING 79
Source of variation df SS
Full model p 1 SSRX1 1 SS(1)X2|X1 1 SS(2|1)X3|X1, X2 1 SS(3|1, 2)...
...
Xp1|X1, . . . , X p2 1 SS(p1|1, . . . , p2)Residual n p SSETotal n 1 SST
The output depends on the order the predictors are entered into the model. The
sequential sum of squares is the unique portion of SSR explained by a predictor,given any previously entered predictors. If you have a model with three predictors,
X1, X2, and X3, the sequential sum of squares for X3 shows how much of theremaining variation X3 explains given that X1 and X2 are already in the model.