heteroskedasticity - · pdf fileconsequences if heteroskedasticity exists, several...

HETEROSKEDASTICITY

Truong Dang Thuy [email protected]

VIET NAM – NETHERLANDS PROGRAMME FOR M.A. IN DEVELOPMENT ECONOMICS

APPLIED ECONOMETRICS – LECTURE 6

mailto:[email protected]

HETEROSKEDASTICITY

One of the assumptions of the classical linear

regression (CLRM) is that the variance of ui, the

error term, is constant, or homoskedastic.

REASONS OF HETEROSKEDASTICITY

The presence of outliers in the data

Incorrect functional form of the regression model

Mixing observations with different measures of

scale (such as mixing high-income households with

low-income households)

CONSEQUENCES

If heteroskedasticity exists, several consequences

ensue:

The OLS estimators are still unbiased and consistent, yet the

estimators are less efficient, making statistical inference less

reliable (i.e., the estimated t values may not be reliable).

Thus, estimators are not best linear unbiased estimators

(BLUE); they are simply linear unbiased estimators (LUE).

In the presence of heteroskedasticity, the BLUE estimators are

provided by the method of weighted least squares (WLS).

Example

Data: from VHLSS 2010

income: individual annual income (1000 VND)

ysq: individual annual cost for health care (1000 VND)

Use the data in ‘hetero.dta’ to run the regression

0 1ysq income

DETECTION OF HETEROSKEDASTICITY

Graph squared residuals (or residuals) against predicted

Y

Breusch-Pagan (BP) test

White’s test

Other tests such as Goldfeld-Quandt, Park, Glejser

tests, Spearman’s rank correlation

OLS

_cons 2000.798 .0657543 3.0e+04 0.000 2000.67 2000.927

income .0019886 6.09e-07 3263.28 0.000 .0019874 .0019898

ysq Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 77441076.6 3465 22349.517 Root MSE = 2.6963

Adj R-squared = 0.9997

Residual 25182.579 3464 7.26979763 R-squared = 0.9997

Model 77415894 1 77415894 Prob > F = 0.0000

F( 1, 3464) = .

Source SS df MS Number of obs = 3466

. reg ysq income

Residuals and Fitted values

predict e, resid

predict yhat1

scatter e yhat1-1

00

-50

05

0

Resid

uals

2000 2500 3000 3500 4000Fitted values

Scatter of residuals on income-1

00

-50

05

0

Resid

uals

0 200000 400000 600000 800000 1.0e+06income

BREUSCH-PAGAN (BP) TEST

Estimate the OLS regression, and obtain the squared OLS residuals from this regression.

Regress the square residuals on the k regressors included in the model.

You can choose other regressors also that might have some bearing on the error variance.

The null hypothesis here is that the error variance is homoskedastic –that is, all the slope coefficients are simultaneously equal to zero.

Use the F statistic from this regression with (k-1) and (n-k) in the numerator and denominator df, respectively, to test this hypothesis.

If the computed F statistic is statistically significant, we can reject the hypothesis of homoskedasticity. If it is not, we may not reject the null hypothesis.

BP test for Heteroskedasticity

reg e2 income

test income

Prob > F = 0.0000

F( 1, 3464) = 887.57

( 1) income = 0

. test income

BP test for Heteroskedasticity

Prob > F = 0.0000

F(1 , 3464) = 887.57

Variables: income

Ho: Constant variance

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity

. hettest, rhs fstat

Prob > chi2 = 0.0000

chi2(1) =106732.81

Variables: income

Ho: Constant variance

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity

. hettest, rhs Assumption on the normality of residuals removed

WHITE’S TEST OF HETEROSKEDASTICITY

Regress the squared residuals on the regressors, the squared

terms of these regressors, and the pair-wise cross-product

term of each regressor.

Obtain the R2 value from this regression and multiply it by

the number of observations.

Under the null hypothesis that there is homoskedasticity,

this product follows the Chi-square distribution with df

equal to the number of coefficients estimated.

The White test is more general and more flexible than the

BP test.

White’s test

0

. disp chi2tail(2,e(r2)*e(N))

2578.1794

. disp e(r2)*e(N)

. quietly reg e2 income income2

. gen income2 = income^2

White’s test

Total 2873.55 4 0.0000

Kurtosis 3.25 1 0.0715

Skewness 292.12 1 0.0000

Heteroskedasticity 2578.18 2 0.0000

Source chi2 df p

Cameron & Trivedi's decomposition of IM-test

Prob > chi2 = 0.0000

chi2(2) = 2578.18

against Ha: unrestricted heteroskedasticity

White's test for Ho: homoskedasticity

. imtest, white

. quietly reg ysq income

REMEDIAL MEASURES

What should we do if we detect heteroskedasticity?

Use method of Weighted Least Squares (WLS)

Divide each observation by the (heteroscedastic) σi and estimate the

transformed model by OLS (yet true variance is rarely known)

If the true error variance is proportional to the square of one of the

regressors, we can divide both sides of the equation by that variable and

run the transformed regression

Use White’s heteroskedasticity-consistent standard errors or

robust standard errors

Valid in large samples

Weighted least square

cone 2000.797 .0010566 1.9e+06 0.000 2000.795 2000.799

incomee .0019886 1.29e-08 1.5e+05 0.000 .0019886 .0019886

ysqe Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 5.5442e+14 3466 1.5996e+11 Root MSE = .99968


Residual 3461.75267 3464 .999351232 R-squared = 1.0000

Model 5.5442e+14 2 2.7721e+14 Prob > F = 0.0000

F( 2, 3464) = .


. reg ysqe incomee cone, nocon

. gen incomee = income/e

. gen cone = 1/e

. gen ysqe = ysq/e

Weighting by residuals

Weighted Least Square

In our case study, we know that

We divide all observations by income

Now

2 2

iV income

0 1

1ysq income

income income income income

0 1Y X u

2

2 2

2

ii

incomeV u V

income income

homoskedastic

Weighted Least Square in Stata

_cons .0019999 1.44e-07 1.4e+04 0.000 .0019996 .0020001

X 2000.002 .0036523 5.5e+05 0.000 1999.995 2000.01

Y Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 12.5867982 3465 .003632554 Root MSE = 6.5e-06


Residual 1.4540e-07 3464 4.1975e-11 R-squared = 1.0000

Model 12.586798 1 12.586798 Prob > F = 0.0000

F( 1, 3464) = .


. reg Y X

. gen X = 1/income

. gen Y = ysq/income

Weighted Least Square in Stata

_cons 2000.002 .0036523 5.5e+05 0.000 1999.995 2000.01

income .0019999 1.44e-07 1.4e+04 0.000 .0019996 .0020001


Total 5146093.08 3465 1485.16395 Root MSE = .16383


Residual 92.9775019 3464 .02684108 R-squared = 1.0000

Model 5146000.11 1 5146000.11 Prob > F = 0.0000

F( 1, 3464) = .


(sum of wgt is 5.4203e-06)

. reg ysq income [aw = wt]

. gen wt = 1/(income^2)

Robust standard errors

In OLS:

Variance:

Variance of estimated coefficients

Now with the existence of heteroskedasticity, we obtain

2

iV

12ˆV X X

2

1

2

2

2

ˆ 0 0

ˆ0 0

...

ˆ0 0 n


The heteroskedasticity consistent standard errors

These standard errors are also called


White-Huber standard errors

Sandwich estimator.

1 1ˆV X X X X X X

Robust standard errors in Stata

_cons 2000.798 .3676633 5441.93 0.000 2000.078 2001.519

income .0019886 5.28e-06 376.77 0.000 .0019783 .0019989


Robust

Root MSE = 2.6963

R-squared = 0.9997

Prob > F = 0.0000

F( 1, 3464) = .

Linear regression Number of obs = 3466

. reg ysq income, robust

_cons 2000.798 .0657543 3.0e+04 0.000 2000.67 2000.927

income .0019886 6.09e-07 3263.28 0.000 .0019874 .0019898


Total 77441076.6 3465 22349.517 Root MSE = 2.6963


Residual 25182.579 3464 7.26979763 R-squared = 0.9997

Model 77415894 1 77415894 Prob > F = 0.0000

F( 1, 3464) = .


. reg ysq income

heteroskedasticity - · pdf fileconsequences if heteroskedasticity exists, several...

Documents