multicollinearity diagnostics remedies
Embed Size (px)
DESCRIPTION
Multicollinearity Diagnostics in RegressionTRANSCRIPT
Topic 16 - Multicollinearitydiagnostics and remedies
STAT/MATH 571A Professor Lingling An
16
Outline Measures of multicollinearity
Remedies
16-1
Informal diagnosticsmulticollinearity regression coecients change greatly when predictors are included/excluded from the models
signicant F-test but no signicant t-tests for s (ignoring intercept)
regression coecients that dont make sense, i.e. dont match scatterplot and/or intuition
Type I and II SS very dierent
predictors that have pairwise correlations16-2
Measures of multicollinearity
16-3
Variance Ination Factor The VIF is related to the variance of the estimated regression coecients 2{b} = 2(X X)1, where b = (b0, b1, ..., bp1)s for b = ( s1 b1, ..., p1 bp1) it can be shown sy sy 1 that 2{b} = ( )2rXX
where rXX is the correlation matrix of p-1 predictor variables =
... r1,p1 ... r2,p1 . . . . rp1,1 rp1,2 ... 1
1 r21 . .
r12 1 . .
16-4
Variance Ination Factor1 VIFk is the the kth diagonal element of rXX 2 Can show VIFk = (1 Rk )1 2 where Rk is the coecient of multiple determination when Xk is regressed on the p 2 other X variables in the model. 2 special case: if only two X variables, R1 = 2 2 R2 = r12 2 so VIFk =1 when Rk = 0, i.e., when Xk is not linearly related to the other X variables. 2 when Rk = 0, then VIFk > 1, indicating an inated variabe for b as a result of multik collinearity amont the X variables.16-5
when Xk has a perfect linear association with the other X variables in the model so 2 that Rk = 1, then VIFk and 2{b } are unk bounded. Thumb of rule: VIFk of 10 or more suggests strong multicollinearity mean of the VIF values also proces information about the severity of the multicollinearity that is compare mean VIF to 1VIF = VIFk p1
mean VIF values considerably larger than 1 are indicative of serious multicollinearity problems
Tolerance Equivalent to VIF
Tolerance(TOL) = 1/VIF
Trouble if TOL < .1
SAS gives VIF and TOL results for each predictor
Use either VIF or TOL
16-6
Example Page 256 Twenty healthy female subjects
Y is body fat via underwater weighing
Underwater weighing expensive/dicult
X1 is triceps skinfold thickness X2 is thigh circumference X3 is midarm circumference
16-7
OutputAnalysis of Variance Sum of Mean Squares Square 396.98461 132.32820 98.40489 6.15031 495.38950 2.47998 20.19500 12.28017 R-Square Adj R-Sq
Source Model Error Corrected Total Root MSE Dependent Mean Coeff Var
DF 3 16 19
F Value 21.52
Pr > F |t| 0.2578 0.1699 0.2849 0.1896
16-8
SAS code for VIF and TOLproc reg data=a1; model fat=skinfold thigh midarm/vif tol; run;Parameter Estimates Parameter DF Estimate 1 1 1 1 117.08469 4.33409 -2.85685 -2.18606 Standard Error t Value Pr > |t| Tolerance 99.78240 1.17 3.01551 1.44 2.58202 1.11 1.59550 -1.37 0.2578 0.1699 0.2849 0.1896 . 0.00141 0.00177 0.00956
Variable Intercept skinfold thigh midarm
Parameter Estimates Variance Inflation 0 708.84291 564.34339 104.60601
Variable Intercept skinfold thigh midarm
DF 1 1 1 1
16-9
Some remedial measures If the model is a polynomial regression model, use centered variables.
Drop one or more predictor variables (i.e., variable selection). - Standard errors on the parameter estimates decrease. - However, how can we tell if the dropped variable(s) give us any useful information. - If the variable is important, the parameter estimates become biased up.
Sometimes, observations can be designed to break the multicollinearity.16-10
Get coecient estimates from additional data from other contexts. For instance, if the model is Yi = 0 + 1Xi1 + 2Xi2 + i; and you have an estimator b1 (for 1 based on another data set, you can estimate 2 by regressing the adjusted variable Yi = Yi b1Xi1 on Xi2. (Common example: in economics, using cross-sectional data to estimate parameters for a time-dependent model.) Use the rst few principal components (or factor loadings) of the predictor variables. (Limitation: may lose interpretability.) Note: all these methods use the ordinary least squares method; alternative way is to modify the method of least squares - Biased regression or Coecient Shrinkage (example: ridge regression)
Ridge Regression Modication of least squares that overcomes multicollinearity problem
Recall least squares suers because (X X) is almost singular thereby resulting in highly unstable parameter estimates
Ridge regression results in biased but more stable estimates
Consider the correlation transformation so the normal equations are given by rXX b = rY X . Since rXX dicult to invert, we add a bias constant, c.bR = (rXX + cI)1rY X16-11
Choice of c Key to approach is choice of c
Common to use the ridge trace and VIFs Ridge trace: simultaneous plot of p 1 parameter estimates for dierent values of c 0. Curves may uctuate widely when c close to zero but eventually stabilize and slowly converge to 0. VIFs tend to fall quickly as c moves away from zero and then change only moderately after that
Choose c where things tend to stabilize
SAS has option ridge=c;16-12
SAS Commandsoptions nocenter ls=72; goptions colors=(none); data a1; infile C:\datasets\Ch07ta01.txt; input skinfold thigh midarm fat; proc print data=a1; run;
proc reg data=a1 outest=bfout; model fat=skinfold thigh midarm /ridge=0 to .1 by .003 outv run; symbol1 v= i=sm5 l=1; symbol2 v= i=sm5 l=2; symbol3 v= i=sm5 l=3; proc gplot; plot (skinfold thigh midarm)*_ridge_ / overlay vref=0; run; quit;
16-13
Ridge Trace
Each value of c gives dierent values of the parameter estimates. (Note the instability of the 16-5 estimate values for small c.)16-14
graph VIF:
16-15
Obs 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
_RIDGE_ 0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016 0.018 0.020 0.022 0.024 0.026 0.028 0.030
skinfold 708.843 50.559 16.982 8.503 5.147 3.486 2.543 1.958 1.570 1.299 1.103 0.956 0.843 0.754 0.683 0.626
thigh 564.343 40.448 13.725 6.976 4.305 2.981 2.231 1.764 1.454 1.238 1.081 0.963 0.872 0.801 0.744 0.697
midarm 104.606 8.280 3.363 2.119 1.624 1.377 1.236 1.146 1.086 1.043 1.011 0.986 0.966 0.949 0.935 0.923
Note that at RIDGE = 0.020, the VIFs are close to 1.
16-16
Obs 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33
_RIDGE_ 0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016 0.018 0.020 0.022 0.024 0.026 0.028 0.030
_RMSE_ 2.47998 2.54921 2.57173 2.58174 2.58739 2.59104 2.59360 2.59551 2.59701 2.59822 2.59924 2.60011 2.60087 2.60156 2.60218 2.60276
Intercept 117.085 22.277 7.725 1.842 -1.331 -3.312 -4.661 -5.637 -6.373 -6.946 -7.403 -7.776 -8.083 -8.341 -8.559 -8.746
skinfold 4.33409 1.46445 1.02294 0.84372 0.74645 0.68530 0.64324 0.61249 0.58899 0.57042 0.55535 0.54287 0.53233 0.52331 0.51549 0.50864
thigh -2.85685 -0.40119 -0.02423 0.12820 0.21047 0.26183 0.29685 0.32218 0.34131 0.35623 0.36814 0.37786 0.38590 0.39265 0.39837 0.40327
mida
-2.186 -0.673 -0.440 -0.346 -0.294 -0.261 -0.239 -0.222 -0.210 -0.199 -0.191 -0.184 -0.178 -0.173 -0.169 -0.165
Note that at RIDGE = 0.020, the RMSE is only increased by 5% (so SSE increase by about 10%), and the parameter estimates are closer to making sense.
16-17
ConclusionSo the solution at c= 0.02 with parameter estimates (-7.4, 0.56, 0.37, -0.19) seems to make the most sense. Notes:
The book makes a big deal about standardizing the variables... SAS does this for you in the ridge option.
Why ridge regression? Estimates tend to be more stable, particularly outside the region of the predictor variables: less aected by small changes in the data. (Ordinary LS estimates can be highly unstable when there is lots of multicollinearity.)16-18
Major drawback: ordinary inference procedures dont work so well.
Other procedures use dierent penalties, e.g. Lasso penalty.
Background Reading KNN sections 10.5 and 11.2
ridge.sas
16-19