heteroskedasticity - · pdf fileconsequences if heteroskedasticity exists, several...

23
HETEROSKEDASTICITY Truong Dang Thuy [email protected] VIET NAM – NETHERLANDS PROGRAMME FOR M.A. IN DEVELOPMENT ECONOMICS APPLIED ECONOMETRICS – LECTURE 6

Upload: lyphuc

Post on 05-Feb-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HETEROSKEDASTICITY - · PDF fileCONSEQUENCES If heteroskedasticity exists, several consequences ensue: The OLS estimators are still unbiased and consistent, yet the estimators are

HETEROSKEDASTICITY

Truong Dang Thuy [email protected]

VIET NAM – NETHERLANDS PROGRAMME FOR M.A. IN DEVELOPMENT ECONOMICS

APPLIED ECONOMETRICS – LECTURE 6

Page 2: HETEROSKEDASTICITY - · PDF fileCONSEQUENCES If heteroskedasticity exists, several consequences ensue: The OLS estimators are still unbiased and consistent, yet the estimators are

HETEROSKEDASTICITY

One of the assumptions of the classical linear

regression (CLRM) is that the variance of ui, the

error term, is constant, or homoskedastic.

Page 3: HETEROSKEDASTICITY - · PDF fileCONSEQUENCES If heteroskedasticity exists, several consequences ensue: The OLS estimators are still unbiased and consistent, yet the estimators are

REASONS OF HETEROSKEDASTICITY

The presence of outliers in the data

Incorrect functional form of the regression model

Mixing observations with different measures of

scale (such as mixing high-income households with

low-income households)

Page 4: HETEROSKEDASTICITY - · PDF fileCONSEQUENCES If heteroskedasticity exists, several consequences ensue: The OLS estimators are still unbiased and consistent, yet the estimators are

CONSEQUENCES

If heteroskedasticity exists, several consequences

ensue:

The OLS estimators are still unbiased and consistent, yet the

estimators are less efficient, making statistical inference less

reliable (i.e., the estimated t values may not be reliable).

Thus, estimators are not best linear unbiased estimators

(BLUE); they are simply linear unbiased estimators (LUE).

In the presence of heteroskedasticity, the BLUE estimators are

provided by the method of weighted least squares (WLS).

Page 5: HETEROSKEDASTICITY - · PDF fileCONSEQUENCES If heteroskedasticity exists, several consequences ensue: The OLS estimators are still unbiased and consistent, yet the estimators are

Example

Data: from VHLSS 2010

income: individual annual income (1000 VND)

ysq: individual annual cost for health care (1000 VND)

Use the data in ‘hetero.dta’ to run the regression

0 1ysq income

Page 6: HETEROSKEDASTICITY - · PDF fileCONSEQUENCES If heteroskedasticity exists, several consequences ensue: The OLS estimators are still unbiased and consistent, yet the estimators are

DETECTION OF HETEROSKEDASTICITY

Graph squared residuals (or residuals) against predicted

Y

Breusch-Pagan (BP) test

White’s test

Other tests such as Goldfeld-Quandt, Park, Glejser

tests, Spearman’s rank correlation

Page 7: HETEROSKEDASTICITY - · PDF fileCONSEQUENCES If heteroskedasticity exists, several consequences ensue: The OLS estimators are still unbiased and consistent, yet the estimators are

OLS

_cons 2000.798 .0657543 3.0e+04 0.000 2000.67 2000.927

income .0019886 6.09e-07 3263.28 0.000 .0019874 .0019898

ysq Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 77441076.6 3465 22349.517 Root MSE = 2.6963

Adj R-squared = 0.9997

Residual 25182.579 3464 7.26979763 R-squared = 0.9997

Model 77415894 1 77415894 Prob > F = 0.0000

F( 1, 3464) = .

Source SS df MS Number of obs = 3466

. reg ysq income

Page 8: HETEROSKEDASTICITY - · PDF fileCONSEQUENCES If heteroskedasticity exists, several consequences ensue: The OLS estimators are still unbiased and consistent, yet the estimators are

Residuals and Fitted values

predict e, resid

predict yhat1

scatter e yhat1-1

00

-50

05

0

Resid

uals

2000 2500 3000 3500 4000Fitted values

Page 9: HETEROSKEDASTICITY - · PDF fileCONSEQUENCES If heteroskedasticity exists, several consequences ensue: The OLS estimators are still unbiased and consistent, yet the estimators are

Scatter of residuals on income-1

00

-50

05

0

Resid

uals

0 200000 400000 600000 800000 1.0e+06income

Page 10: HETEROSKEDASTICITY - · PDF fileCONSEQUENCES If heteroskedasticity exists, several consequences ensue: The OLS estimators are still unbiased and consistent, yet the estimators are

BREUSCH-PAGAN (BP) TEST

Estimate the OLS regression, and obtain the squared OLS residuals from this regression.

Regress the square residuals on the k regressors included in the model.

You can choose other regressors also that might have some bearing on the error variance.

The null hypothesis here is that the error variance is homoskedastic –that is, all the slope coefficients are simultaneously equal to zero.

Use the F statistic from this regression with (k-1) and (n-k) in the numerator and denominator df, respectively, to test this hypothesis.

If the computed F statistic is statistically significant, we can reject the hypothesis of homoskedasticity. If it is not, we may not reject the null hypothesis.

Page 11: HETEROSKEDASTICITY - · PDF fileCONSEQUENCES If heteroskedasticity exists, several consequences ensue: The OLS estimators are still unbiased and consistent, yet the estimators are

BP test for Heteroskedasticity

reg e2 income

test income

Prob > F = 0.0000

F( 1, 3464) = 887.57

( 1) income = 0

. test income

Page 12: HETEROSKEDASTICITY - · PDF fileCONSEQUENCES If heteroskedasticity exists, several consequences ensue: The OLS estimators are still unbiased and consistent, yet the estimators are

BP test for Heteroskedasticity

Prob > F = 0.0000

F(1 , 3464) = 887.57

Variables: income

Ho: Constant variance

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity

. hettest, rhs fstat

Prob > chi2 = 0.0000

chi2(1) =106732.81

Variables: income

Ho: Constant variance

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity

. hettest, rhs Assumption on the normality of residuals removed

Page 13: HETEROSKEDASTICITY - · PDF fileCONSEQUENCES If heteroskedasticity exists, several consequences ensue: The OLS estimators are still unbiased and consistent, yet the estimators are

WHITE’S TEST OF HETEROSKEDASTICITY

Regress the squared residuals on the regressors, the squared

terms of these regressors, and the pair-wise cross-product

term of each regressor.

Obtain the R2 value from this regression and multiply it by

the number of observations.

Under the null hypothesis that there is homoskedasticity,

this product follows the Chi-square distribution with df

equal to the number of coefficients estimated.

The White test is more general and more flexible than the

BP test.

Page 14: HETEROSKEDASTICITY - · PDF fileCONSEQUENCES If heteroskedasticity exists, several consequences ensue: The OLS estimators are still unbiased and consistent, yet the estimators are

White’s test

0

. disp chi2tail(2,e(r2)*e(N))

2578.1794

. disp e(r2)*e(N)

. quietly reg e2 income income2

. gen income2 = income^2

Page 15: HETEROSKEDASTICITY - · PDF fileCONSEQUENCES If heteroskedasticity exists, several consequences ensue: The OLS estimators are still unbiased and consistent, yet the estimators are

White’s test

Total 2873.55 4 0.0000

Kurtosis 3.25 1 0.0715

Skewness 292.12 1 0.0000

Heteroskedasticity 2578.18 2 0.0000

Source chi2 df p

Cameron & Trivedi's decomposition of IM-test

Prob > chi2 = 0.0000

chi2(2) = 2578.18

against Ha: unrestricted heteroskedasticity

White's test for Ho: homoskedasticity

. imtest, white

. quietly reg ysq income

Page 16: HETEROSKEDASTICITY - · PDF fileCONSEQUENCES If heteroskedasticity exists, several consequences ensue: The OLS estimators are still unbiased and consistent, yet the estimators are

REMEDIAL MEASURES

What should we do if we detect heteroskedasticity?

Use method of Weighted Least Squares (WLS)

Divide each observation by the (heteroscedastic) σi and estimate the

transformed model by OLS (yet true variance is rarely known)

If the true error variance is proportional to the square of one of the

regressors, we can divide both sides of the equation by that variable and

run the transformed regression

Use White’s heteroskedasticity-consistent standard errors or

robust standard errors

Valid in large samples

Page 17: HETEROSKEDASTICITY - · PDF fileCONSEQUENCES If heteroskedasticity exists, several consequences ensue: The OLS estimators are still unbiased and consistent, yet the estimators are

Weighted least square

cone 2000.797 .0010566 1.9e+06 0.000 2000.795 2000.799

incomee .0019886 1.29e-08 1.5e+05 0.000 .0019886 .0019886

ysqe Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 5.5442e+14 3466 1.5996e+11 Root MSE = .99968

Adj R-squared = 1.0000

Residual 3461.75267 3464 .999351232 R-squared = 1.0000

Model 5.5442e+14 2 2.7721e+14 Prob > F = 0.0000

F( 2, 3464) = .

Source SS df MS Number of obs = 3466

. reg ysqe incomee cone, nocon

. gen incomee = income/e

. gen cone = 1/e

. gen ysqe = ysq/e

Weighting by residuals

Page 18: HETEROSKEDASTICITY - · PDF fileCONSEQUENCES If heteroskedasticity exists, several consequences ensue: The OLS estimators are still unbiased and consistent, yet the estimators are

Weighted Least Square

In our case study, we know that

We divide all observations by income

Now

2 2

iV income

0 1

1ysq income

income income income income

0 1Y X u

2

2 2

2

ii

incomeV u V

income income

homoskedastic

Page 19: HETEROSKEDASTICITY - · PDF fileCONSEQUENCES If heteroskedasticity exists, several consequences ensue: The OLS estimators are still unbiased and consistent, yet the estimators are

Weighted Least Square in Stata

_cons .0019999 1.44e-07 1.4e+04 0.000 .0019996 .0020001

X 2000.002 .0036523 5.5e+05 0.000 1999.995 2000.01

Y Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 12.5867982 3465 .003632554 Root MSE = 6.5e-06

Adj R-squared = 1.0000

Residual 1.4540e-07 3464 4.1975e-11 R-squared = 1.0000

Model 12.586798 1 12.586798 Prob > F = 0.0000

F( 1, 3464) = .

Source SS df MS Number of obs = 3466

. reg Y X

. gen X = 1/income

. gen Y = ysq/income

Page 20: HETEROSKEDASTICITY - · PDF fileCONSEQUENCES If heteroskedasticity exists, several consequences ensue: The OLS estimators are still unbiased and consistent, yet the estimators are

Weighted Least Square in Stata

_cons 2000.002 .0036523 5.5e+05 0.000 1999.995 2000.01

income .0019999 1.44e-07 1.4e+04 0.000 .0019996 .0020001

ysq Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 5146093.08 3465 1485.16395 Root MSE = .16383

Adj R-squared = 1.0000

Residual 92.9775019 3464 .02684108 R-squared = 1.0000

Model 5146000.11 1 5146000.11 Prob > F = 0.0000

F( 1, 3464) = .

Source SS df MS Number of obs = 3466

(sum of wgt is 5.4203e-06)

. reg ysq income [aw = wt]

. gen wt = 1/(income^2)

Page 21: HETEROSKEDASTICITY - · PDF fileCONSEQUENCES If heteroskedasticity exists, several consequences ensue: The OLS estimators are still unbiased and consistent, yet the estimators are

Robust standard errors

In OLS:

Variance:

Variance of estimated coefficients

Now with the existence of heteroskedasticity, we obtain

2

iV

12ˆV X X

2

1

2

2

2

ˆ 0 0

ˆ0 0

...

ˆ0 0 n

Page 22: HETEROSKEDASTICITY - · PDF fileCONSEQUENCES If heteroskedasticity exists, several consequences ensue: The OLS estimators are still unbiased and consistent, yet the estimators are

Robust standard errors

The heteroskedasticity consistent standard errors

These standard errors are also called

Robust standard errors

White-Huber standard errors

Sandwich estimator.

1 1ˆV X X X X X X

Page 23: HETEROSKEDASTICITY - · PDF fileCONSEQUENCES If heteroskedasticity exists, several consequences ensue: The OLS estimators are still unbiased and consistent, yet the estimators are

Robust standard errors in Stata

_cons 2000.798 .3676633 5441.93 0.000 2000.078 2001.519

income .0019886 5.28e-06 376.77 0.000 .0019783 .0019989

ysq Coef. Std. Err. t P>|t| [95% Conf. Interval]

Robust

Root MSE = 2.6963

R-squared = 0.9997

Prob > F = 0.0000

F( 1, 3464) = .

Linear regression Number of obs = 3466

. reg ysq income, robust

_cons 2000.798 .0657543 3.0e+04 0.000 2000.67 2000.927

income .0019886 6.09e-07 3263.28 0.000 .0019874 .0019898

ysq Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 77441076.6 3465 22349.517 Root MSE = 2.6963

Adj R-squared = 0.9997

Residual 25182.579 3464 7.26979763 R-squared = 0.9997

Model 77415894 1 77415894 Prob > F = 0.0000

F( 1, 3464) = .

Source SS df MS Number of obs = 3466

. reg ysq income