ali, redescending m-estimator

10

Click here to load reader

Upload: muhammad-ali

Post on 23-Jun-2015

125 views

Category:

Education


0 download

DESCRIPTION

Detail notes on Autocorrelation, including all mathematical work. Muhammad Ali Lecturer in Statistics Higher Education, Department, KPK, Pakistan.

TRANSCRIPT

Page 1: Ali, Redescending M-estimator

Muhammad Ali

Lecturer in Statistics

GPGC Mardan.

1

Autocorrelation

Definition

The classical assumptions in the linear regression are that the errors terms �i have zero mean and

constant variance and are uncorrelated [E(�i) = 0, Var(�i) = δ2, and E(�i �j ) = 0 ]. For the

construction of Confidence Interval, and Testing of hypothesis about the regression coefficients

we add the assumption of normality. so that �i are NID(0, δ2). Some applications of regression

involve regressor and response variables that have a natural sequential order over time. Such data

are called time series data. Regression models using time series data occur relatively often in

economics, business, and some fields of engineering. The assumption of uncorrelated or

independent errors for time series data is often not appropriate. Usually the errors in time series

data exhibit serial correlation, that is, E(�i �j ) ≠ 0. Such error terms are said to be

autocorrelated. Autocorrelation sometimes called "lagged correlation or "serial correlation".

Causes of Autocorrelation

Specification Bias:

a) Excluded Variables Case

There are several causes of autocorrelation. Perhaps the primary cause of

autocorrelation in regression problems involving time series data is failure to include one

or more important regressors in the model. For example suppose that we wish to regress

Page 2: Ali, Redescending M-estimator

Muhammad Ali

Lecturer in Statistics

GPGC Mardan.

2

annual sales of a soft drink company against the annual advertising expenditure for that

product. Now the growth in population over the period of time used in the study will also

influence the product sales. If population size is not included in the model, this may cause

the errors in the model to be positively autocorrelated, because population size is

positively correlated with product sales.

Consider the true model:

Sale (Yt) = β0 + β1X1t + β2X2t + εt ---------------------- ( I )

Where Y is the sale, X1 is the advertising expenditure, X2 is the population size.

However for some reason we run the following regression:

Sale (Yt) = β0 + β1X1t + υt ---------------------- ( II )

As model ( I ) is a true model and we run model ( II ), and hence the error or disturbance

term υ will be autocorrelated.

b) Incorrect Functional Form:

Consider the following cost and output model:

Yt = β1 + β2 X1 + β3 X22 + υt ------------------- ( III )

Page 3: Ali, Redescending M-estimator

Muhammad Ali

Lecturer in Statistics

GPGC Mardan.

3

Instead of using the above form which is considered to be correct, if we fit the

following model:

Yt = β1 + β2 X1 + β3 X2 + υt ----------------( IV)

In this case, υ will reflect autocorrelation because of the use of an incorrect

functional form.

Theoretical consequences of autocorrelation

The presence of autocorrelation in the errors has several effects on the ordinary least-squares

regression procedures. These are summarized as follows:

1. Ordinary least-squares regression coefficients are still unbiased.

2. OLS regression coefficients are no longer efficient i..e. they are no longer minimum

variance estimates. We say that these estimates are inefficient.

3. The residual mean square MSres may seriously underestimate δ2. Consequently, the

standard errors of the regression coefficients may be too small. Thus, confidence intervals

are shorter than they really should be, and tests of hypothesis on individual regression

coefficients may indicate that one or more regression contribute significantly to the

model when they really do not. Generally, underestimating δ2 gives the researcher a false

impression of accuracy.

4. The confidence intervals and tests of hypothesis based on the t and F distributions are no

longer appropriate.

Page 4: Ali, Redescending M-estimator

Muhammad Ali

Lecturer in Statistics

GPGC Mardan.

4

OLS estimates in presence of autocorrelation

There are three main consequences of autocorrelation on the ordinary least squares estimates.

1. Ordinary least squares regression coefficients are still unbiased even if the disturbance

term is autocorrelated. i.e.

We know that

( )

( ) ( )

( )

εβ

εβ

εβ

εβεβ

β

XXX

XXXI

XXXXXXX

XYXXXX

YXXX

′′+=

′′+=

′′+′′=

+=∴+′′=

′′=

−−

1

1

11

1

1

)(

)(

)()(

ˆ

Taking expectation on both sides of the above equation #1, assuming that E(ε) = 0 i.e.

β

β

εββ

=

+=

′+= −

0

)()()ˆ( 1 XEXXE

Hence in the presence of autocorrelation the OLS estimates are still unbiased.

Page 5: Ali, Redescending M-estimator

Muhammad Ali

Lecturer in Statistics

GPGC Mardan.

5

2. The residual mean square underestimate δ2 result in small standard errors of the

regression coefficients.

We know that variance of the OLS estimate is:

)(ˆ

Put ˆ

(B)equation in values these

1 and 0)(

/

)(

)(

;

ere Wh ;

0)(X )(

)(

)(

)()(

)(

))((ˆ

)(]ˆ[)ˆ(

11

2

2

10

10

2

2

i2

22

2

cw

w

Putting

Xwx

XXxxwSince

BwXww

Xw

x

xwYw

XXxx

xY

XXX

XXY

XX

XXYXXY

XX

YYXX

Since

AEVar

ii

ii

ii

i

iiii

iiiii

iii

i

iiii

ii

i

ii

i

ii

i

iii

i

ii

−−−−−−−−∑=−

=∑+=

=∑=∑

−∑=∑∑=∑

−−−−−−−∑+∑+∑=++∑=

∑=∴∑=

−=∑

∑=

=−∑∴−∑

−∑=

−∑

−∑−−∑=

−∑

−−∑=

−−−−−−=

εββ

ββεββ

εββεββ

β

βββ

Page 6: Ali, Redescending M-estimator

Muhammad Ali

Lecturer in Statistics

GPGC Mardan.

6

Putting the values of equation ( c) in equation (A).

[ ] [ ]

[ ]

[ ]

∑ ∑= <

−−

−−

+=

+++++++=

+++++++=

+++=∑=

n

i

n

jijijiii

nnnnnn

nnnnnn

iniiii

wwEwE

wwwwwwwwwE

wwwwwwwwwE

wwwEwEVar

1

22

1131312121222

\22

22

12

1

1131312121222

\22

22

12

1

222

2

][2][

...[2]...

2...22...

...)ˆ(

εεε

εεεεεεεεε

εεεεεεεεε

εεεεβ

∑ ∑= <

−−−−−−+=n

i

n

jijijiii DEwwEw

1

22 )()(2)( εεε

If there is no correlation between error terms i.e. E( 0) =ji εε then equation (D)

becomes:

( ) )()(

/

)(

0)()ˆ(

2

222

22

22

22

222

22

1

22

EXX

xx

x

x

xw

wEwVar

ii

i

i

i

ii

i

n

iii

−−−−−−∑

=∑=∑

∑=

∑∑=∑=

∑=+=∑=

δδδ

δδ

δεβ

Now we have to find ( )β̂Var when the errors are autocorrected. i.e. errors are AR(1).

Page 7: Ali, Redescending M-estimator

Muhammad Ali

Lecturer in Statistics

GPGC Mardan.

7

Under AR (1)

�� = ����� + � �� � ����(0, ���

�(��� = ��(����� + �(�� = 0

���(��� = �����(����� + ���(�� �� = ���� + ���

�� − ���� = ���

�� = ���1 − ��

As ���(��� = ���(����� = �(���� = �(����� � = ��

Now �(������� = ��(����� + ��(������ = ��(����� � + �(������ = � � !��"!

���#$%&'(1 = � )∑ +���∑+�� ,�= � )∑+����� +2∑+�+���������

(∑ +���� , = ��∑+�� +

2∑+�+���(∑ +���� � ���1 − ��

= 1∑+��

���1 − �� +2∑+�+���(∑+���� � ���1 − �� =

���1 − ��1

∑+�� )1 +2∑+�+���∑+�� �, − − −− − (.�

It is clear from equation (E) and (F) that 1)ˆ()ˆ( 22 ARVarVar ββ < . Therefore, if we use )ˆ( 2βVar ,

we shall inflate the precision or accuracy of the estimator 2β̂ . As a result, the t ratio will be

overestimated.

Methods of detection of Autocorrelation

Following are the methods of detecting the problem of autocorrelation:

1. Residual plots: Residuals plots can be useful for the detection of autocorrelation. The

most meaningful display is the plot of residuals versus time. If there is positive

Page 8: Ali, Redescending M-estimator

Muhammad Ali

Lecturer in Statistics

GPGC Mardan.

8

autocorrelation, residuals of identical sign occur in clusters. That is, there are not

enough changes of sign in the pattern of residuals. On the other hand, if there is negative

autocorrelation, the residuals will alternate signs too rapidly.

2. The Runs Test: The Run test having the following steps.

Step1. Write down the plus and minus sign's of the residuals.

Step2. Count the number of plus signs, negative signs and total number of runs( A run is

a sequence of either positive or negative signs without interruption). Now let

N= Total number of observations= N1+N2

N1=Number of + symbols (i.e. + residuals)

N2=Number of - symbols (i.e. - residuals)

R= number of runs

Step3. Compare the value of 'R' with that of the tabulated value, if it is less than the

smaller tabulated value or greater that the larger tabulated value then we have to reject

the hypothesis that pattern of errors are random i.e. H0; The sequence of errors are

random. In other words, the residuals exhibit autocorrelation.

Special case: If N1 > 20 or N2 > 20 or both then the number of runs is asymptotically

normally distributed with

Mean: 12 21 +=

N

NNRµ & Variance:

)1()(

)2(22

21212

−−

=NN

NNNNNRδ

Testing of hypothesis procedure is the same as of the Z-test.

Note that the run test sometimes also known as the Geary test, a nonparametric test.

Page 9: Ali, Redescending M-estimator

Muhammad Ali

Lecturer in Statistics

GPGC Mardan.

9

3. Duban-Watson d Test: This test is based on the assumption that the errors in the

regression model are generated by a first-order autoregressive process observed at

equally spaced time periods, i.e.

�� = ����� + �--------------( i ) where �� is the error term in the model at time t, � is an NID(0,/2

ε) random variable,

and ρ is the autocorrelation parameter. Thus, a simple linear regression model with first-

order autoregressive errors would be

yt = β0 + β1 xt + εt---------------( ii )

Where yt and xt are the observations on the response and regressor variables at time

period t. The hypothesis usually considered in the Dubrin-Watson test are

Ho: ρ = 0

H1: ρ > 0

The test statistic is

( )

22

21

t

n

ttt

e

eed

−=∑

=−

The value of the d statistic lies between two bounds, say dL and dU, such that if d is

outside these limits, a conclusion regarding the hypothesis can be reached. The decision

procedure is as follows:

Reject H0 if:

0 ≤ d ≤ dL Evidence of positive autocorrelation

4−dL ≤ d ≤ 4 Evidence of negative autocorrelation

Page 10: Ali, Redescending M-estimator

Muhammad Ali

Lecturer in Statistics

GPGC Mardan.

10

Do not reject H0 if:

dU ≤ d ≤ 4−dU

Zone of indecision if:

dL ≤ d ≤ dU or 4−dU ≤ d ≤ 4− dL

Values of dL and dU can be obtained from the Durban-Watson table.

Situation where negative autocorrelation occurs are not often encountered. However, if a

test for negative autocorrelation is desired, one can use the statistic 4−d, where d is

defined above, then the decision rules for H0 : ρ = 0 versus H1 : ρ < 0 are the same as

those used in testing for positive autocorrelation. It is also possible to conduct a two-

sided ( H0 : ρ = 0 versus H1 : ρ ≠ 0 ) by using both one-sided tests simultaneously. If this

is done, the two-sided procedure has Type I error 2α, where α is the Type I error used for

each one-sided test.