multicollinearity diagnostics remedies

Click here to load reader

Post on 18-Nov-2014

574 views

Category:

Documents

0 download

Embed Size (px)

DESCRIPTION

Multicollinearity Diagnostics in Regression

TRANSCRIPT

Topic 16 - Multicollinearitydiagnostics and remedies

STAT/MATH 571A Professor Lingling An

16

Outline Measures of multicollinearity

Remedies

16-1

Informal diagnosticsmulticollinearity regression coecients change greatly when predictors are included/excluded from the models

signicant F-test but no signicant t-tests for s (ignoring intercept)

regression coecients that dont make sense, i.e. dont match scatterplot and/or intuition

Type I and II SS very dierent

predictors that have pairwise correlations16-2

Measures of multicollinearity

16-3

Variance Ination Factor The VIF is related to the variance of the estimated regression coecients 2{b} = 2(X X)1, where b = (b0, b1, ..., bp1)s for b = ( s1 b1, ..., p1 bp1) it can be shown sy sy 1 that 2{b} = ( )2rXX

where rXX is the correlation matrix of p-1 predictor variables =

... r1,p1 ... r2,p1 . . . . rp1,1 rp1,2 ... 1

1 r21 . .

r12 1 . .

16-4

Variance Ination Factor1 VIFk is the the kth diagonal element of rXX 2 Can show VIFk = (1 Rk )1 2 where Rk is the coecient of multiple determination when Xk is regressed on the p 2 other X variables in the model. 2 special case: if only two X variables, R1 = 2 2 R2 = r12 2 so VIFk =1 when Rk = 0, i.e., when Xk is not linearly related to the other X variables. 2 when Rk = 0, then VIFk > 1, indicating an inated variabe for b as a result of multik collinearity amont the X variables.16-5

when Xk has a perfect linear association with the other X variables in the model so 2 that Rk = 1, then VIFk and 2{b } are unk bounded. Thumb of rule: VIFk of 10 or more suggests strong multicollinearity mean of the VIF values also proces information about the severity of the multicollinearity that is compare mean VIF to 1VIF = VIFk p1

mean VIF values considerably larger than 1 are indicative of serious multicollinearity problems

Tolerance Equivalent to VIF

Tolerance(TOL) = 1/VIF

Trouble if TOL < .1

SAS gives VIF and TOL results for each predictor

Use either VIF or TOL

16-6

Example Page 256 Twenty healthy female subjects

Y is body fat via underwater weighing

Underwater weighing expensive/dicult

X1 is triceps skinfold thickness X2 is thigh circumference X3 is midarm circumference

16-7

OutputAnalysis of Variance Sum of Mean Squares Square 396.98461 132.32820 98.40489 6.15031 495.38950 2.47998 20.19500 12.28017 R-Square Adj R-Sq

Source Model Error Corrected Total Root MSE Dependent Mean Coeff Var

DF 3 16 19

F Value 21.52

Pr > F |t| 0.2578 0.1699 0.2849 0.1896

16-8

SAS code for VIF and TOLproc reg data=a1; model fat=skinfold thigh midarm/vif tol; run;Parameter Estimates Parameter DF Estimate 1 1 1 1 117.08469 4.33409 -2.85685 -2.18606 Standard Error t Value Pr > |t| Tolerance 99.78240 1.17 3.01551 1.44 2.58202 1.11 1.59550 -1.37 0.2578 0.1699 0.2849 0.1896 . 0.00141 0.00177 0.00956

Variable Intercept skinfold thigh midarm

Parameter Estimates Variance Inflation 0 708.84291 564.34339 104.60601

Variable Intercept skinfold thigh midarm

DF 1 1 1 1

16-9

Some remedial measures If the model is a polynomial regression model, use centered variables.

Drop one or more predictor variables (i.e., variable selection). - Standard errors on the parameter estimates decrease. - However, how can we tell if the dropped variable(s) give us any useful information. - If the variable is important, the parameter estimates become biased up.

Sometimes, observations can be designed to break the multicollinearity.16-10

Get coecient estimates from additional data from other contexts. For instance, if the model is Yi = 0 + 1Xi1 + 2Xi2 + i; and you have an estimator b1 (for 1 based on another data set, you can estimate 2 by regressing the adjusted variable Yi = Yi b1Xi1 on Xi2. (Common example: in economics, using cross-sectional data to estimate parameters for a time-dependent model.) Use the rst few principal components (or factor loadings) of the predictor variables. (Limitation: may lose interpretability.) Note: all these methods use the ordinary least squares method; alternative way is to modify the method of least squares - Biased regression or Coecient Shrinkage (example: ridge regression)

Ridge Regression Modication of least squares that overcomes multicollinearity problem

Recall least squares suers because (X X) is almost singular thereby resulting in highly unstable parameter estimates

Ridge regression results in biased but more stable estimates

Consider the correlation transformation so the normal equations are given by rXX b = rY X . Since rXX dicult to invert, we add a bias constant, c.bR = (rXX + cI)1rY X16-11

Choice of c Key to approach is choice of c

Common to use the ridge trace and VIFs Ridge trace: simultaneous plot of p 1 parameter estimates for dierent values of c 0. Curves may uctuate widely when c close to zero but eventually stabilize and slowly converge to 0. VIFs tend to fall quickly as c moves away from zero and then change only moderately after that

Choose c where things tend to stabilize

SAS has option ridge=c;16-12

SAS Commandsoptions nocenter ls=72; goptions colors=(none); data a1; infile C:\datasets\Ch07ta01.txt; input skinfold thigh midarm fat; proc print data=a1; run;

proc reg data=a1 outest=bfout; model fat=skinfold thigh midarm /ridge=0 to .1 by .003 outv run; symbol1 v= i=sm5 l=1; symbol2 v= i=sm5 l=2; symbol3 v= i=sm5 l=3; proc gplot; plot (skinfold thigh midarm)*_ridge_ / overlay vref=0; run; quit;

16-13

Ridge Trace

Each value of c gives dierent values of the parameter estimates. (Note the instability of the 16-5 estimate values for small c.)16-14

graph VIF:

16-15

Obs 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32

_RIDGE_ 0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016 0.018 0.020 0.022 0.024 0.026 0.028 0.030

skinfold 708.843 50.559 16.982 8.503 5.147 3.486 2.543 1.958 1.570 1.299 1.103 0.956 0.843 0.754 0.683 0.626

thigh 564.343 40.448 13.725 6.976 4.305 2.981 2.231 1.764 1.454 1.238 1.081 0.963 0.872 0.801 0.744 0.697

midarm 104.606 8.280 3.363 2.119 1.624 1.377 1.236 1.146 1.086 1.043 1.011 0.986 0.966 0.949 0.935 0.923

Note that at RIDGE = 0.020, the VIFs are close to 1.

16-16

Obs 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33

_RIDGE_ 0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016 0.018 0.020 0.022 0.024 0.026 0.028 0.030

_RMSE_ 2.47998 2.54921 2.57173 2.58174 2.58739 2.59104 2.59360 2.59551 2.59701 2.59822 2.59924 2.60011 2.60087 2.60156 2.60218 2.60276

Intercept 117.085 22.277 7.725 1.842 -1.331 -3.312 -4.661 -5.637 -6.373 -6.946 -7.403 -7.776 -8.083 -8.341 -8.559 -8.746

skinfold 4.33409 1.46445 1.02294 0.84372 0.74645 0.68530 0.64324 0.61249 0.58899 0.57042 0.55535 0.54287 0.53233 0.52331 0.51549 0.50864

thigh -2.85685 -0.40119 -0.02423 0.12820 0.21047 0.26183 0.29685 0.32218 0.34131 0.35623 0.36814 0.37786 0.38590 0.39265 0.39837 0.40327

mida

-2.186 -0.673 -0.440 -0.346 -0.294 -0.261 -0.239 -0.222 -0.210 -0.199 -0.191 -0.184 -0.178 -0.173 -0.169 -0.165

Note that at RIDGE = 0.020, the RMSE is only increased by 5% (so SSE increase by about 10%), and the parameter estimates are closer to making sense.

16-17

ConclusionSo the solution at c= 0.02 with parameter estimates (-7.4, 0.56, 0.37, -0.19) seems to make the most sense. Notes:

The book makes a big deal about standardizing the variables... SAS does this for you in the ridge option.

Why ridge regression? Estimates tend to be more stable, particularly outside the region of the predictor variables: less aected by small changes in the data. (Ordinary LS estimates can be highly unstable when there is lots of multicollinearity.)16-18

Major drawback: ordinary inference procedures dont work so well.

Other procedures use dierent penalties, e.g. Lasso penalty.

Background Reading KNN sections 10.5 and 11.2

ridge.sas

16-19