heteroskedasticity - · pdf fileconsequences if heteroskedasticity exists, several...
TRANSCRIPT
HETEROSKEDASTICITY
Truong Dang Thuy [email protected]
VIET NAM – NETHERLANDS PROGRAMME FOR M.A. IN DEVELOPMENT ECONOMICS
APPLIED ECONOMETRICS – LECTURE 6
HETEROSKEDASTICITY
One of the assumptions of the classical linear
regression (CLRM) is that the variance of ui, the
error term, is constant, or homoskedastic.
REASONS OF HETEROSKEDASTICITY
The presence of outliers in the data
Incorrect functional form of the regression model
Mixing observations with different measures of
scale (such as mixing high-income households with
low-income households)
CONSEQUENCES
If heteroskedasticity exists, several consequences
ensue:
The OLS estimators are still unbiased and consistent, yet the
estimators are less efficient, making statistical inference less
reliable (i.e., the estimated t values may not be reliable).
Thus, estimators are not best linear unbiased estimators
(BLUE); they are simply linear unbiased estimators (LUE).
In the presence of heteroskedasticity, the BLUE estimators are
provided by the method of weighted least squares (WLS).
Example
Data: from VHLSS 2010
income: individual annual income (1000 VND)
ysq: individual annual cost for health care (1000 VND)
Use the data in ‘hetero.dta’ to run the regression
0 1ysq income
DETECTION OF HETEROSKEDASTICITY
Graph squared residuals (or residuals) against predicted
Y
Breusch-Pagan (BP) test
White’s test
Other tests such as Goldfeld-Quandt, Park, Glejser
tests, Spearman’s rank correlation
OLS
_cons 2000.798 .0657543 3.0e+04 0.000 2000.67 2000.927
income .0019886 6.09e-07 3263.28 0.000 .0019874 .0019898
ysq Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 77441076.6 3465 22349.517 Root MSE = 2.6963
Adj R-squared = 0.9997
Residual 25182.579 3464 7.26979763 R-squared = 0.9997
Model 77415894 1 77415894 Prob > F = 0.0000
F( 1, 3464) = .
Source SS df MS Number of obs = 3466
. reg ysq income
Residuals and Fitted values
predict e, resid
predict yhat1
scatter e yhat1-1
00
-50
05
0
Resid
uals
2000 2500 3000 3500 4000Fitted values
Scatter of residuals on income-1
00
-50
05
0
Resid
uals
0 200000 400000 600000 800000 1.0e+06income
BREUSCH-PAGAN (BP) TEST
Estimate the OLS regression, and obtain the squared OLS residuals from this regression.
Regress the square residuals on the k regressors included in the model.
You can choose other regressors also that might have some bearing on the error variance.
The null hypothesis here is that the error variance is homoskedastic –that is, all the slope coefficients are simultaneously equal to zero.
Use the F statistic from this regression with (k-1) and (n-k) in the numerator and denominator df, respectively, to test this hypothesis.
If the computed F statistic is statistically significant, we can reject the hypothesis of homoskedasticity. If it is not, we may not reject the null hypothesis.
BP test for Heteroskedasticity
reg e2 income
test income
Prob > F = 0.0000
F( 1, 3464) = 887.57
( 1) income = 0
. test income
BP test for Heteroskedasticity
Prob > F = 0.0000
F(1 , 3464) = 887.57
Variables: income
Ho: Constant variance
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
. hettest, rhs fstat
Prob > chi2 = 0.0000
chi2(1) =106732.81
Variables: income
Ho: Constant variance
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
. hettest, rhs Assumption on the normality of residuals removed
WHITE’S TEST OF HETEROSKEDASTICITY
Regress the squared residuals on the regressors, the squared
terms of these regressors, and the pair-wise cross-product
term of each regressor.
Obtain the R2 value from this regression and multiply it by
the number of observations.
Under the null hypothesis that there is homoskedasticity,
this product follows the Chi-square distribution with df
equal to the number of coefficients estimated.
The White test is more general and more flexible than the
BP test.
White’s test
0
. disp chi2tail(2,e(r2)*e(N))
2578.1794
. disp e(r2)*e(N)
. quietly reg e2 income income2
. gen income2 = income^2
White’s test
Total 2873.55 4 0.0000
Kurtosis 3.25 1 0.0715
Skewness 292.12 1 0.0000
Heteroskedasticity 2578.18 2 0.0000
Source chi2 df p
Cameron & Trivedi's decomposition of IM-test
Prob > chi2 = 0.0000
chi2(2) = 2578.18
against Ha: unrestricted heteroskedasticity
White's test for Ho: homoskedasticity
. imtest, white
. quietly reg ysq income
REMEDIAL MEASURES
What should we do if we detect heteroskedasticity?
Use method of Weighted Least Squares (WLS)
Divide each observation by the (heteroscedastic) σi and estimate the
transformed model by OLS (yet true variance is rarely known)
If the true error variance is proportional to the square of one of the
regressors, we can divide both sides of the equation by that variable and
run the transformed regression
Use White’s heteroskedasticity-consistent standard errors or
robust standard errors
Valid in large samples
Weighted least square
cone 2000.797 .0010566 1.9e+06 0.000 2000.795 2000.799
incomee .0019886 1.29e-08 1.5e+05 0.000 .0019886 .0019886
ysqe Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 5.5442e+14 3466 1.5996e+11 Root MSE = .99968
Adj R-squared = 1.0000
Residual 3461.75267 3464 .999351232 R-squared = 1.0000
Model 5.5442e+14 2 2.7721e+14 Prob > F = 0.0000
F( 2, 3464) = .
Source SS df MS Number of obs = 3466
. reg ysqe incomee cone, nocon
. gen incomee = income/e
. gen cone = 1/e
. gen ysqe = ysq/e
Weighting by residuals
Weighted Least Square
In our case study, we know that
We divide all observations by income
Now
2 2
iV income
0 1
1ysq income
income income income income
0 1Y X u
2
2 2
2
ii
incomeV u V
income income
homoskedastic
Weighted Least Square in Stata
_cons .0019999 1.44e-07 1.4e+04 0.000 .0019996 .0020001
X 2000.002 .0036523 5.5e+05 0.000 1999.995 2000.01
Y Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 12.5867982 3465 .003632554 Root MSE = 6.5e-06
Adj R-squared = 1.0000
Residual 1.4540e-07 3464 4.1975e-11 R-squared = 1.0000
Model 12.586798 1 12.586798 Prob > F = 0.0000
F( 1, 3464) = .
Source SS df MS Number of obs = 3466
. reg Y X
. gen X = 1/income
. gen Y = ysq/income
Weighted Least Square in Stata
_cons 2000.002 .0036523 5.5e+05 0.000 1999.995 2000.01
income .0019999 1.44e-07 1.4e+04 0.000 .0019996 .0020001
ysq Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 5146093.08 3465 1485.16395 Root MSE = .16383
Adj R-squared = 1.0000
Residual 92.9775019 3464 .02684108 R-squared = 1.0000
Model 5146000.11 1 5146000.11 Prob > F = 0.0000
F( 1, 3464) = .
Source SS df MS Number of obs = 3466
(sum of wgt is 5.4203e-06)
. reg ysq income [aw = wt]
. gen wt = 1/(income^2)
Robust standard errors
In OLS:
Variance:
Variance of estimated coefficients
Now with the existence of heteroskedasticity, we obtain
2
iV
12ˆV X X
2
1
2
2
2
ˆ 0 0
ˆ0 0
...
ˆ0 0 n
Robust standard errors
The heteroskedasticity consistent standard errors
These standard errors are also called
Robust standard errors
White-Huber standard errors
Sandwich estimator.
1 1ˆV X X X X X X
Robust standard errors in Stata
_cons 2000.798 .3676633 5441.93 0.000 2000.078 2001.519
income .0019886 5.28e-06 376.77 0.000 .0019783 .0019989
ysq Coef. Std. Err. t P>|t| [95% Conf. Interval]
Robust
Root MSE = 2.6963
R-squared = 0.9997
Prob > F = 0.0000
F( 1, 3464) = .
Linear regression Number of obs = 3466
. reg ysq income, robust
_cons 2000.798 .0657543 3.0e+04 0.000 2000.67 2000.927
income .0019886 6.09e-07 3263.28 0.000 .0019874 .0019898
ysq Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 77441076.6 3465 22349.517 Root MSE = 2.6963
Adj R-squared = 0.9997
Residual 25182.579 3464 7.26979763 R-squared = 0.9997
Model 77415894 1 77415894 Prob > F = 0.0000
F( 1, 3464) = .
Source SS df MS Number of obs = 3466
. reg ysq income