multicollinearity diagnostics remedies

Post on 18-Nov-2014

580 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Multicollinearity Diagnostics in Regression

TRANSCRIPT

Topic 16 - Multicollinearity-

diagnostics and remedies

STAT/MATH 571A

Professor Lingling An

16

Outline

• Measures of multicollinearity

• Remedies

16-1

Informal diagnostics-

multicollinearity

• regression coefficients change greatly when

predictors are included/excluded from the

models

• significant F-test but no significant t-tests

for β’s (ignoring intercept)

• regression coefficients that don’t “make sense”,

i.e. don’t match scatterplot and/or intu-

ition

• Type I and II SS very different

• predictors that have pairwise correlations

16-2

Measures of multicollinearity

16-3

Variance Inflation Factor

• The VIF is related to the variance of the

estimated regression coefficients

• σ2{b} = σ2(X ′X)−1, where b = (b0, b1, ..., bp−1)

• for b∗ = (s1syb1, ...,

sp−1sybp−1) it can be shown

that σ2{b∗} = (σ∗)2r−1XX

• where rXX is the correlation matrix of p-1

predictor variables =1 r12 ... r1,p−1r21 1 ... r2,p−1. . . .. . . .

rp−1,1 rp−1,2 ... 1

16-4

Variance Inflation Factor

• VIFk is the the kth diagonal element of r−1XX

• Can show VIFk = (1−R2k)−1

• where R2k is the coefficient of multiple de-

termination when Xk is regressed on thep− 2 other X variables in the model.

• special case: if only two X variables, R21 =

R22 = r2

12

• so VIFk=1 when R2k = 0, i.e., when Xk is

not linearly related to the other X variables.

• when R2k 6= 0, then VIFk > 1, indicating an

inflated variabe for b∗k as a result of multi-collinearity amont the X variables.

16-5

• when Xk has a perfect linear associationwith the other X variables in the model sothat R2

k = 1, then VIFk and σ2{b∗k} are un-bounded.

• Thumb of rule: VIFk of 10 or more sug-gests strong multicollinearity

• mean of the VIF values also profices in-formation about the severity of the multi-collinearity

• that is compare mean VIF to 1

VIF =

∑VIFk

p− 1

• mean VIF values considerably larger than1 are indicative of serious multicollinearityproblems

Tolerance

• Equivalent to VIF

• Tolerance(TOL) = 1/VIF

• Trouble if TOL < .1

• SAS gives VIF and TOL results for each

predictor

• Use either VIF or TOL

16-6

Example Page 256

• Twenty healthy female subjects

• Y is body fat via underwater weighing

• Underwater weighing expensive/difficult

• X1 is triceps skinfold thickness

• X2 is thigh circumference

• X3 is midarm circumference

16-7

Output

Analysis of VarianceSum of Mean

Source DF Squares Square F Value Pr > FModel 3 396.98461 132.32820 21.52 <.0001Error 16 98.40489 6.15031Corrected Total 19 495.38950

Root MSE 2.47998 R-Square 0.8014Dependent Mean 20.19500 Adj R-Sq 0.7641Coeff Var 12.28017

Parameter EstimatesParameter Standard

Variable DF Estimate Error t Value Pr > |t|Intercept 1 117.08469 99.78240 1.17 0.2578skinfold 1 4.33409 3.01551 1.44 0.1699thigh 1 -2.85685 2.58202 -1.11 0.2849midarm 1 -2.18606 1.59550 -1.37 0.1896

16-8

SAS code for VIF and TOL

proc reg data=a1;

model fat=skinfold thigh midarm/vif tol;

run;

Parameter Estimates

Parameter StandardVariable DF Estimate Error t Value Pr > |t| Tolerance

Intercept 1 117.08469 99.78240 1.17 0.2578 .skinfold 1 4.33409 3.01551 1.44 0.1699 0.00141thigh 1 -2.85685 2.58202 1.11 0.2849 0.00177midarm 1 -2.18606 1.59550 -1.37 0.1896 0.00956

Parameter Estimates

VarianceVariable DF Inflation

Intercept 1 0skinfold 1 708.84291thigh 1 564.34339midarm 1 104.60601

16-9

Some remedial measures

• If the model is a polynomial regression model,

use centered variables.

• Drop one or more predictor variables (i.e.,

variable selection).

- Standard errors on the parameter esti-

mates decrease.

- However, how can we tell if the dropped

variable(s) give us any useful informa-

tion.

- If the variable is important, the param-

eter estimates become biased up.

• Sometimes, observations can be designed

to break the multicollinearity.

16-10

• Get coefficient estimates from additional

data from other contexts.

For instance, if the model is Yi = β0 +

β1Xi1 + β2Xi2 + εi; and you have an es-

timator b1 (for β1 based on another data

set, you can estimate β2 by regressing the

adjusted variable Y ′i = Yi − b1Xi1 on Xi2.

(Common example: in economics, using

cross-sectional data to estimate parame-

ters for a time-dependent model.)

• Use the first few principal components (or

factor loadings) of the predictor variables.

(Limitation: may lose interpretability.)

Note: all these methods use the ordinary least

squares method; alternative way is to modify

the method of least squares - Biased regres-

sion or Coefficient Shrinkage (example: ridge

regression)

Ridge Regression

• Modification of least squares that overcomes

multicollinearity problem

• Recall least squares suffers because (X′X) is

almost singular thereby resulting in highly

unstable parameter estimates

• Ridge regression results in biased but more

stable estimates

• Consider the correlation transformation so

the normal equations are given by rXXb =

rY X. Since rXX difficult to invert, we add

a bias constant, c.

bR = (rXX + cI)−1rY X

16-11

Choice of c

• Key to approach is choice of c

• Common to use the ridge trace and VIF’s

– Ridge trace: simultaneous plot of p− 1 param-eter estimates for different values of c ≥ 0.Curves may fluctuate widely when c close tozero but eventually stabilize and slowly convergeto 0.

– VIF’s tend to fall quickly as c moves away fromzero and then change only moderately after that

• Choose c where things tend to “stabilize”

• SAS has option ridge=c;

16-12

SAS Commands

options nocenter ls=72;goptions colors=(’none’);data a1;

infile ’C:\datasets\Ch07ta01.txt’;input skinfold thigh midarm fat;

proc print data=a1;run;

proc reg data=a1 outest=bfout;model fat=skinfold thigh midarm /ridge=0 to .1 by .003 outvif;

run;

symbol1 v=’’ i=sm5 l=1;symbol2 v=’’ i=sm5 l=2;symbol3 v=’’ i=sm5 l=3;proc gplot;plot (skinfold thigh midarm)*_ridge_ / overlay vref=0;

run;quit;

16-13

Ridge Trace

16-5

Each value of c gives different values of the pa-

rameter estimates. (Note the instability of the

estimate values for small c.)

16-14

graph VIF:

16-15

Obs _RIDGE_ skinfold thigh midarm

2 0.000 708.843 564.343 104.6064 0.002 50.559 40.448 8.2806 0.004 16.982 13.725 3.3638 0.006 8.503 6.976 2.119

10 0.008 5.147 4.305 1.62412 0.010 3.486 2.981 1.37714 0.012 2.543 2.231 1.23616 0.014 1.958 1.764 1.14618 0.016 1.570 1.454 1.08620 0.018 1.299 1.238 1.04322 0.020 1.103 1.081 1.01124 0.022 0.956 0.963 0.98626 0.024 0.843 0.872 0.96628 0.026 0.754 0.801 0.94930 0.028 0.683 0.744 0.93532 0.030 0.626 0.697 0.923

Note that at RIDGE = 0.020, the VIF’s are close to 1.

16-16

Obs _RIDGE_ _RMSE_ Intercept skinfold thigh midarm

3 0.000 2.47998 117.085 4.33409 -2.85685 -2.186065 0.002 2.54921 22.277 1.46445 -0.40119 -0.673817 0.004 2.57173 7.725 1.02294 -0.02423 -0.440839 0.006 2.58174 1.842 0.84372 0.12820 -0.34604

11 0.008 2.58739 -1.331 0.74645 0.21047 -0.2944313 0.010 2.59104 -3.312 0.68530 0.26183 -0.2618515 0.012 2.59360 -4.661 0.64324 0.29685 -0.2393417 0.014 2.59551 -5.637 0.61249 0.32218 -0.2227819 0.016 2.59701 -6.373 0.58899 0.34131 -0.2100421 0.018 2.59822 -6.946 0.57042 0.35623 -0.1999123 0.020 2.59924 -7.403 0.55535 0.36814 -0.1916325 0.022 2.60011 -7.776 0.54287 0.37786 -0.1847027 0.024 2.60087 -8.083 0.53233 0.38590 -0.1788129 0.026 2.60156 -8.341 0.52331 0.39265 -0.1737231 0.028 2.60218 -8.559 0.51549 0.39837 -0.1692633 0.030 2.60276 -8.746 0.50864 0.40327 -0.16531

Note that at RIDGE = 0.020, the RMSE is only in-creased by 5% (so SSE increase by about 10%), andthe parameter estimates are closer to making sense.

16-17

Conclusion

So the solution at c= 0.02 with parameter esti-

mates (-7.4, 0.56, 0.37, -0.19) seems to make

the most sense.

Notes:

• The book makes a big deal about stan-

dardizing the variables... SAS does this for

you in the ridge option.

• Why ridge regression? Estimates tend to

be more stable, particularly outside the re-

gion of the predictor variables: less affected

by small changes in the data. (Ordinary

LS estimates can be highly unstable when

there is lots of multicollinearity.)

16-18

• Major drawback: ordinary inference proce-

dures don’t work so well.

• Other procedures use different penalties,

e.g. “Lasso” penalty.

Background Reading

• KNN sections 10.5 and 11.2

• ridge.sas

16-19

top related