heteroscedasticity

10
Muhammad Ali Lecturer in Statistics GPGC Mardan. 1 Heteroscedasticity Definition One of the assumption of the classical linear regression model that the error ( i ε ) term having the same variance i.e. δ 2 . But in most practical situation this assumption did not fulfill, and we have the problem of heteroscedasticity. Heteroscedasticity does not destroy the unbiased and consistency property of the ordinary least square estimators, but these estimators have not the property of minimum variance. Recall that OLS makes the assumption that V (ε i ) =σ2 for al i. That is, the variance of the error term is constant. (Homoscedasticity). If the error terms do not have constant variance, they are said to be heteroscedasticity. The term means “differing variance” and comes from the Greek “hetero” ('different') and “scedasis” ('dispersion').] When heteroscedasticity might occur/causes of heteroscedasticity 1. Errors may increase as the value of an independent variable increases. For example, consider a model in which annual family income is the independent variable and annual family expenditures on vacations is the dependent variable. Families with low incomes will spend relatively little on vacations, and the

Upload: muhammad-ali

Post on 19-Jul-2015

248 views

Category:

Education


1 download

TRANSCRIPT

Page 1: Heteroscedasticity

Muhammad Ali

Lecturer in Statistics

GPGC Mardan.

1

Heteroscedasticity

Definition

One of the assumption of the classical linear regression model that the error (iε )

term having the same variance i.e. δ2. But in most practical situation this

assumption did not fulfill, and we have the problem of heteroscedasticity.

Heteroscedasticity does not destroy the unbiased and consistency property of the

ordinary least square estimators, but these estimators have not the property of

minimum variance. Recall that OLS makes the assumption that V (εi ) =σ2 for al i.

That is, the variance of the error term is constant. (Homoscedasticity). If the error

terms do not have constant variance, they are said to be heteroscedasticity. The

term means “differing variance” and comes from the Greek “hetero” ('different')

and “scedasis” ('dispersion').]

When heteroscedasticity might occur/causes of heteroscedasticity

1. Errors may increase as the value of an independent variable increases. For

example, consider a model in which annual family income is the independent

variable and annual family expenditures on vacations is the dependent variable.

Families with low incomes will spend relatively little on vacations, and the

Page 2: Heteroscedasticity

Muhammad Ali

Lecturer in Statistics

GPGC Mardan.

2

variations in expenditures across such families will be small. But for families

with large incomes, the amount of discretionary income will be higher. The mean

amount spent on vacations will be higher, and there will also be greater variability

among such families, resulting in heteroscedasticity. Note that, in this example, a

high family income is a necessary but not sufficient condition for large vacation

expenditures. Any time a high value for an independent variable is a necessary but

not sufficient condition for an observation to have a high value on a dependent

variable, heteroscedasticity is likely.

2. Other model misspecifications can produce heteroscedasticity. For example, it

may be that instead of using Y, you should be using the log of Y. Instead of using

X, maybe you should be using X2, or both X and X2. Important variables may be

omitted from the model. If the model were correctly specified, you might find that

the patterns of heteroscedasticity disappeared.

3. As data Collection techniques improve, δ2i is likely to decrease. Thus banks that

have sophisticated data processing equipment are likely to commit fewer errors in

the monthly or quarterly statements of their customers than banks without such

facilities.

4. Heteroscedasticity can also arise as a result of the presence of outliers. An

outlying observation is an observation that is much different in relation to the

observations in the sample.

Page 3: Heteroscedasticity

Muhammad Ali

Lecturer in Statistics

GPGC Mardan.

3

5. Error learning models, as people learn, their errors of behavior become smaller

over time. In this case, δ2i is expected to decrease. As an example, the number of

typing speed errors decreases as the number of typing practice increases, the

average number of typing errors as well as their variances decreases.

Consequences of heteroscedasticity

Following are the consequences of the heteroscedasticity:

1. Heteroscedasticity does not result in biased parameter estimates. However, OLS

estimates are no longer BLUE. That is, among all the unbiased estimators, OLS

does not provide the estimate with the smallest variance. Depending on the nature

of the heteroscedasticity, significance tests can be too high or too low.

2. In addition, the standard errors are biased when heteroscedasticity is present. This

in turn leads to bias in test statistics and confidence intervals.

3. Fortunately, unless heteroscedasticity is “marked,” significance tests are virtually

unaffected, and thus OLS estimation can be used without concern of serious

distortion. But, severe heteroscedasticity can sometimes be a problem. Warning:

Note that heteroscedasticity can be very problematic with methods besides OLS.

For example, in logistic regression heteroscedasticity can produce biased and

misleading parameter estimates.

Page 4: Heteroscedasticity

Muhammad Ali

Lecturer in Statistics

GPGC Mardan.

4

OLS estimation in presence of heteroscedasticity

If we introduce heteroscedasticity by letting that E( 22 ) ii δε = but retain all other

assumptions of the classical model the OLS estimates are still unbiased.

Consider the two variable regression model.

iii XY εββ ++= 10

We know that the ordinary least square estimate of β1 is:

Ax

x

x

Xxx

xXxx

xXxx

x

Xx

x

xY

x

Yx

xYYx

x

yx

i

ii

i

ii

i

iiii

i

iiiii

i

iii

i

i

i

ii

iii

i

ii

−−−−−−∑

∑+

∑=

∑+∑=

∑+∑+∑=

++∑=

∑−

∑=

∑−∑=

∑=

221

21

210

1

210

1

221

21

21

ˆ

)(ˆ

ˆ

/)(ˆ

ˆ

εβ

εβ

εβββ

εβββ

β

β

β

Now

1)()(

)(

)()(

)(2

=−∑−−∑

−∑=

−∑−∑

−∑=

XXXXXX

XXX

XXXX

XXX

x

Xx

iii

ii

ii

ii

i

ii

Page 5: Heteroscedasticity

Muhammad Ali

Lecturer in Statistics

GPGC Mardan.

5

Put this value in equation (A)

Similarly 00 )ˆ( ββ =E

It is shown that in the presence of heteroscedasticity the OLS estimators are unbiased.

Variance of OLS estimator in the presence of heteroscedasticity

Since

[ ]

[ ]

[ ]

2

2

1

2

22

222

2222

22

21

21

2222

22

21

21

i

112121222

22

22

12

11

2i2

i

2

121

2

11

)ˆ(

)(

...w

)(...)()(w

0)E( that know webecause zero toequals rmproduct te cross The

......)ˆVar(

wAs wE

result previous Using

ˆ)ˆ(

i

i

i

i

iii

nn

nn

j

nnnnnn

i

ii

i

ii

xVar

x

xw

ww

EwEwE

wwwwwwwE

x

x

x

xE

EVar

∑=

∑=∑=

++=

++=

=

++++++=

∑=∑=

∑+=

−=

−−

δβ

δδ

δδδ

εεε

εε

εεεεεεεβ

ε

βεβ

βββ

Page 6: Heteroscedasticity

Muhammad Ali

Lecturer in Statistics

GPGC Mardan.

6

Which is different when Homoscedasticity is present in the model.

Tests for Detection of Heteroscedasticity

The following tests to be used for detection of multicollinearity:

1. Park Test

Park test suggest that δ2i is some function of the explanatory variable Xi. i.e.

iiXXu

as

iX

eX

iiiii

iii

iii

−−−−−−−−−−++=++=

−−−−−−−−−−−++=

=

υβαυβδ

δ

υβδδ

δδ υβ

lnlnlnˆln

.regression following therunning andproxy a u usingsuggest park ,unknown is Since

lnlnln

22

2i

2i

22

22

If β found statistically significant in the above equation then it means that

heteroscedasticity is present in the data, otherwise we may accept the assumption of

Homoscedasticity.

The Park test is thus a two-stage procedure. In the first stage we run the OLS regression

disregarding the heteroscedasticity question. We obtain iu from this regression, and then

in the second stage we run the regression (ii).

Page 7: Heteroscedasticity

Muhammad Ali

Lecturer in Statistics

GPGC Mardan.

7

2. Glejsar Test

Glejsar test is much similar to Park test. After obtaining residuals iu from the OLs

regression Glejsar suggest regressing the absolute of the iu on the X variable that is

thought to be closely associated with δ2i .

Glejsar used the following functional form:

ˆ

ˆ

ˆ

υXββu

221

21

21

21

21

ii21i

iii

iii

i

i

i

ii

iii

Xu

Xu

Xu

Xu

Xu

υββ

υββ

υββ

υββ

υββ

++=

++=

++=

++=

++=

++=

Where υi is the error term.

Page 8: Heteroscedasticity

Muhammad Ali

Lecturer in Statistics

GPGC Mardan.

8

Goldfeld and Quandt point out that the error term vi has some problems in the above

expressions.

• Its expected value is not equal to zero.

• It is serially correlated.

• The last two expression are not linear in parameters and therefore cannot be estimated

with the usual OLS procedure.

3. Spearman's Rank Correlation Test.

The well known spearman's rank correlation coefficient is given by the following

formula.

( )

−∑

−=1

612

2

nn

dr i

s

Where d= difference between two rankings and n= number of individuals. The above

spearman's rank correlation coefficient can be used to detect heteroscedasticity.

The procedure for Spearman's rank correlation coefficient is as follows:

i. Fit the regression line on Y and X and find the residuals.

ii. Rank the residuals by ignoring their sign.

iii. Rank either the value of X or Y.

iv. Find difference between two rankings(di).

v. Apply the following test statistic to test the hypothesis that the population

rank correlation coefficient ρi = 0 and n > 8 i.e.

Page 9: Heteroscedasticity

Muhammad Ali

Lecturer in Statistics

GPGC Mardan.

9

freedom of degree 2-n' with 1

22

s

s

r

nrt

−=

If the computed value of t exceeds than the tabulated value then we may

accept the hypothesis of heteroscedasticity; otherwise we may reject it.

4. Goldfeld-Quandt Test

This test is suggested if the heteroscedasticitic variance δ2i is positively related to one of

the predictor variables in the regression model.

Consider the two-variable regression model:

iii XY εββ ++= 21

Suppose that δ2i is positively related to X as:

δ2i=δ

2X i2

Now to test the hypothesis that there is no heteroscedasticity we will follow the following

steps.

Step#1. Rank the observations beginning with the lowest value of X.

Step#2. Omit 'c' central observations where 'c' is fixed in advance, and then divide

the remaining observation into two groups.

Step#3. Fit the OLS regression model to both groups and obtain sum of square of

regression i.e. RSS1 and RSS2. RSS1 representing the RSS to the smaller

Page 10: Heteroscedasticity

Muhammad Ali

Lecturer in Statistics

GPGC Mardan.

10

variance groups and RSS2 representing the RSS to the larger variance

group. Both RSS1 and RSS2 having the same degrees of freedom. i.e.

( )

2

2k-c-nor

2

−−k

cn

Where k is the number of parameters to be estimated. In two variable case k=2

Step#4 Compute the ratio

dfRSS

dfRSS

/

/

1

2=λ

If the error term ε is normally distributed i.e. ε~N(0,δ2) then λ follows the F distribution with

2/2 and 2/2 21 kcnkcn −−=−−= υυ degrees of freedom.

If the computed value of λ is greater than the tabulated value of F then we can reject the

hypothesis of Homoscedasticity.