estimation of arma processes - · pdf fileestimation of arma processes helle bunzel isu...

Estimation of ARMA processes

Helle Bunzel

ISU

February 17, 2009

Helle Bunzel (ISU) Estimation of ARMA processes February 17, 2009 1 / 55

The Box-Jenkins Principle

Historically (in the 1960�s) econometric models with many variablesand equations were developed.

These models �t the data beautifully.They are not good at forecasting.

It turns out that low-dimensional ARMA(p,q) processes typically do abetter job with forecasting.


The Box-Jenkins Principle

The Box-Jenkins Principle for forecasting:

1 Transform the data, if nessesary, so covariance stationarity is aresonable assumption.

2 Make an initial guess at small values of p and q for an ARMA(p, q)model that might describe the (transformed) series.

3 Estimate the parameters.4 Perform diagnostic analysis to con�rm that the model is consistentwith the observed features of the data.

For Step 2 we need information on the correlation structure.


The Autocorrelation Function

Recall that the autocorrelation function is de�ned as:

ρs =γsp

V (yt+s )V (yt )

This is what these look like:


The Autocorrelation Function

More examples


The Partial Autocorrelation Function I

Clearly it is the case that it is hard to tell the processes apart fromthe autocorrelation function alone.

Another important tool is the partial autocorrelation function.

De�nition:

First de-mean y , such that y �t = yt � µThen the �rst partial autocorrelation is φ11 such that

y �t = φ11y�t�1 + ut

The second partial autocorrelation is φ22 where

y �t = φ21y�t�1 + φ22y

�t�2 + ut

Clearly for an AR (1) , φ22 = 0.

These can be found as functions of the autocorrelations:

Note that the m0th autocorrelation simply is the last coe¢ cient in thelinear regression of yt � µ on it�s own m most recent values.


The Partial Autocorrelation Function II

Recall that we can write the coe¢ cients of the projection of yt+1 onXt as

α0 = E�yt+1X 0t

� �E�X 0tXt

��1Here

X 0t = [yt�1 � µ, yt�2 � µ, ..., yt�m � µ],X 0t = [y �t�1, y

�t�2, ..., y

�t�m ]

And we�re projecting y �t , not yt+1.


The Partial Autocorrelation Function III

Now note that

E�X 0tXt

�=

2664E�y �t�1y

�t�1�

E�y �t�1y

�t�2�

... E�y �t�1y

�t�m

�E�y �t�2y

�t�1�

E�y �t�2y

�t�2�

. E�y �t�2y

�t�m

�: : . :

E�y �t�my

�t�1�E�y �t�my

�t�2�... E (y �t�my

�t�m)

3775

=

2664γ0 γ1 ... γm�1γ1 γ0 . γm�2: : . :

γm�1 γm�2 ... γ0

3775and

E�y �t X

0t

�=�E�y �t y

�t�1�E�y �t y

�t�2�... E (y �t y

�t�m)

�Helle Bunzel (ISU) Estimation of ARMA processes February 17, 2009 8 / 55

The Partial Autocorrelation Function IVThis means that we could �nd the partial autocorrelations by takingthe m0th entry of α where

α0 =�

γ1 γ2 ... γm� 2664

γ0 γ1 ... γm�1γ1 γ0 . γm�2: : . :

γm�1 γm�2 ... γ0

3775�1

Note that we need to invert an m�m matrix to get this one.....


The Partial Autocorrelation Function V

The partial autocorrelation of di¤erent models look like:


The Partial Autocorrelation Function VI


The Autocorrelation Functions

An overview of the theoretical properties of the correlation functions:


Sample Autocorrelations I

We want to calculate the autocorrelations from the data and see howthey match up to the theoretical autocorrelations of various processes.This will help us �nd the correct model.

We use the following as sample autocorrelations. These are basicallyestimates of the parameters.

µ = y =1T

T

∑t=1yt

σ2 =1T

T

∑t=1(yt � y)2

rs =∑Tt=s+1 (yt � y) (yt�s � y)

∑Tt=1 (yt � y)

2


Sample Autocorrelations II

Just as with any other estimator we need to know the variance ofthese estimates.

If fytg is stationary with normally distributed errors, Box and Jenkinsfound that

V (rs ) =

( 1T if s = 1

1T

�1+ 2∑s�1

j=1 r2j

�if s > 1

How to use this?

For an MA (s) process, γs is 0.For large samples, rs , is normally distributed.This is enough information to create a con�dence interval.

Use the con�dence interval to test:

First look at r1. If r1 falls outside the con�dence interval, we rejectγ1 = 0.Then we can look at r2 etc.


Sample Autocorrelations III

Do enough of these and one will fall outside the con�dence interval.(size!!)

Ljung-Box Statistic:

Q = T (T + 2)s

∑k=1

r2kT � k

If all the autocorrelations are zero this is χ2 (s) .

Can also be used to check that the residual are white noise. Then

under the null Q � χ2 (s � p � q) , or, if a constant is included,Q � χ2 (s � p � q � 1)


Goodness of Fit

Just as in standard models you can asses goodness of �t.Fit can always be improved by adding to p and q.Criteria which trade of �t vs a more parsemonious model:

Akaike information critereon:

AIC = T ln (SSR) + 2n

Schwartz Bayesian criterion:

SBC = T ln (SSR) + n lnT

Where n : number of estimated parameters and T : number of usableobservations.Keep T the same even if estimating �rst an AR (1) and then anAR (2)SBC punishes additional parameters harder.SBC is consistent, AIC is not.


Estimation of ARMA models. IA data example

In general, estimation of AR models can be done with simpleregressions, but MA processes are more complicated.

For now let the computer do the estimation.

We will concern ourselves with selecting the right model.

Suppose 100 data points are generated according to:

yt = 0.7yt�1 + εt

First calculate the sample (partial) auto correlations.

The �rst three sample autocorrelations are r1 = 0.74, r2 = 0.58 andr3 = 0.47.

Note that γ1 = 0.7, γ2 = 0.49 and γ3 = 0.343.

The �rst sample partial autocorrelation is 0.71.


Estimation of ARMA models. IIA data example

Graph of sample correlations:

Recall that

V (rs ) =

( 1T if s = 1

1T

�1+ 2∑s�1

j=1 r2j

�if s > 1

And that if we have an MA (q) then ρq+1 = 0.


Estimation of ARMA models. IIIA data example

Test MA (0) vs MA (1) . If the process is an MA (0) , then the standarddeviation of r1 is 0.1.We had r1 = 0.74.Clearly reject that it is 0.V (r2) = 0.021 and the standard deviation is 0.1449. Again we rejectthat r2 is 0.Seems like we might have some action.

For the partial autocorrelation only lags 1 and 12 seem to besigni�cant.

Either AR(1), AR(12) or ARMA(1, 1).Box Jenkins would indicate looking only at AR(1) and ARMA(1, 1)unless we have monthly data:


Estimation of ARMA models. IVA data example


Estimation of ARMA models.

For further diagnostics, we can plot the sample autocorrelations ofthe residuals:


Estimation of ARMA models. I

More on the example:

Does there seem to be a model which allows most of PACF to be 0,but not at lag 12?

First consider the ACF and the PACF for the ARMA(1,1) model:

yt = a1yt�1 + εt + b1εt�1

First ACF. From Yule-Walker you get:

E (ytyt ) = a1E (ytyt�1) + E (yt εt ) + b1E (yt εt�1),γ0 = a1γ1 + σ2 + b1E (yt εt�1)

E (yt εt�1) = E ((a1yt�1 + εt + b1εt�1) εt�1)

= a1σ2 + b1σ2 = (a1 + b1) σ2


Estimation of ARMA models. IIγ0 = a1γ1 + σ2 + b1 (a1 + b1) σ2 = a1γ1 + [1+ b1 (a1 + b1)] σ

2

E (yt�1yt ) = a1E (yt�1yt�1) + E (yt�1εt ) + b1E (yt�1εt�1),γ1 = a1γ0 + b1σ

2

Solve:

γ0 = a1γ1 + [1+ b1 (a1 + b1)] σ2

= a21γ0 + a1b1σ2 + [1+ b1 (a1 + b1)] σ2 ,

γ0 =1+ 2a1b1 + b21

1� a21σ2

γ1 = a1γ0 + b1σ2 =

a1 + a21b1 + a1b21 + b1

1� a21σ2 ,

γ1 =(a1 + b1) (1+ a1b1)

1� a21σ2


Estimation of ARMA models. III

E (yt�2yt ) = a1E (yt�2yt�1) + E (yt�2εt ) + b1E (yt�2εt�1),γ2 = a1γ1

So, declining ACF.....

Now PACF, Recall we need the m0th entry of α :

α0 =�

γ1 γ2 ... γm� 2664

γ0 γ1 ... γm�1γ1 γ0 . γm�2: : . :

γm�1 γm�2 ... γ0

3775�1

φ11 =γ1γ0=(a1 + b1) (1+ a1b1)1+ 2a1b1 + b21


Estimation of ARMA models. IV

α0 =�

γ1 γ2� � γ0 γ1

γ1 γ0

��1=

hγ0

γ1γ20�γ21

� γ1γ2

γ20�γ21γ0

γ2γ20�γ21

� γ21γ20�γ21

i

φ22 =γ0γ2 � γ21

γ20 � γ21=

γ2γ0��

γ1γ0

�21�

�γ1γ0

�2 = a1γ1γ0��

γ1γ0

�21�

�γ1γ0

�2=

a1φ11 � φ2111� φ211

φ22 = 0,φ11 = a1


Estimation of ARMA models. V

α0 =�

γ1 γ2 γ3� 24 γ0 γ1 γ2

γ1 γ0 γ1γ2 γ1 γ0

35�1

φ33 = γ1γ21 � γ0γ2

(γ0 � γ2) (γ20 � 2γ21 + γ0γ2)

� γ1γ2

γ20 + γ2γ0 � 2γ21

+γ1γ3

γ0γ1 � γ1γ2

γ20 � γ21γ20 + γ2γ0 � 2γ21

=γ1

(γ20 � 2γ21 + γ0γ2)

"γ21 � γ0γ2(γ0 � γ2)

� γ2 +γ3�γ20 � γ21

�γ0γ1 � γ1γ2

#


Estimation of ARMA models. VI

γ21 � γ0γ2(γ0 � γ2)

� γ2 +γ3�γ20 � γ21

�γ0γ1 � γ1γ2

=γ31 + γ1γ

22 + γ20γ3 � γ21γ3 � 2γ0γ1γ2

γ1 (γ0 � γ2)

= γ0φ311 + a

21φ311 + a

21φ11 � a21φ311 � 2a1φ211

φ11 (1� a1φ11)

= γ0φ311 + a

21φ11 � 2a1φ211

φ11 (1� a1φ11)= γ0

φ211 + a21 � 2a1φ11

(1� a1φ11)

= γ0(φ11 � a1)

2

(1� a1φ11)


Estimation of ARMA models. VII

φ33 =γ1

(γ20 � 2γ21 + γ0γ2)

"γ21 � γ0γ2(γ0 � γ2)

� γ2 +γ3�γ20 � γ21

�γ0γ1 � γ1γ2

#

=γ1γ0

(γ20 � 2γ21 + γ0γ2)

(φ11 � a1)2

(1� a1φ11)

So, for φ33 = 0 and φ11 > 0, we�d need

φ11 = a1

Why is this not compatible with our model?


Estimation of ARMA models. VIIIYou can also see this by solving:

φ11 =(a1 + b1) (1+ a1b1)1+ 2a1b1 + b21

= a1

(a1 + b1) (1+ a1b1)1+ 2a1b1 + b21

= a1 ,

(a1 + b1) (1+ a1b1)� a1�1+ 2a1b1 + b21

�= 0,

�b1 (a1 � 1) (a1 + 1) = 0,b1 = 0


Estimation of ARMA models. IX

Now calculate the ACF and the PACF for the model

yt = a1yt�1 + εt + b12εt�12

From Yule-Walker you get:

E (ytyt ) = a1E (ytyt�1) + E (yt εt ) + b12E (yt εt�12),γ0 = a1γ1 + σ2 + b12E (yt εt�12)

E (yt εt�12) = E��

11� a1L

(εt + b12εt�12)�

εt�12

�= E

" ∞

∑i=0ai1εt�i + b12

∞

∑i=0ai1εt�12�i

!#εt�12

!= E

�a121 ε2t�12 + b12ε

2t�12

�=

�a121 + b12

�σ2


Estimation of ARMA models. X

γ0 = a1γ1 + γ0 + b12E (yt εt�12),γ0 = a1γ1 + σ2 + b12

�a121 + b12

�σ2

E (yt�1yt ) = a1E (yt�1yt�1) + E (yt�1εt ) + b12E (yt�1εt�12),γ1 = a1γ0 + b12E (yt�1εt�12)

E (yt�1εt�12) = E��

11� a1L

(εt�1 + b12εt�13)�

εt�12

�= E

" ∞

∑i=0ai1εt�1�i + b12

∞


!#εt�12

!= a111 σ2

γ1 = a1γ0 + b12a111 σ2


Estimation of ARMA models. XIPlug into �rst equation

γ0 = a1γ1 + σ2 + b12�a121 + b12

�σ2 ,

γ0 = a1�a1γ0 + b12a

111 σ2

�+ σ2 + b12

�a121 + b12

�σ2�

1� a21�

γ0 = b12a121 σ2 + σ2 + b12�a121 + b12

�σ2

γ0 =1+ 2b12a121 + b

212

1� a21σ2

γ1 = a1γ0 + b12a111 σ2

= a11+ 2b12a121 + b

212

1� a21σ2 + b12a111 σ2

=a1 + 2b12a131 + b

212a1 + b12a

111 � b12a131

1� a21σ2

=a1 + b12a131 + b

212a1 + b12a

111

1� a21σ2


Estimation of ARMA models. XII

E (yt�2yt ) = a1E (yt�2yt�1) + E (yt�2εt ) + b12E (yt�2εt�12),γ2 = a1γ1 + b12E (yt�2εt�12)

E (yt�2εt�12) = E��

11� a1L

(εt�2 + b12εt�14)�

εt�12

�= E

" ∞

∑i=0ai1εt�2�i + b12

∞


!#εt�12

!= a101 σ2


Estimation of ARMA models. XIII

γ2 = a1γ1 + b12a101 σ2

= a1a1 + b12a131 + b

212a1 + b12a

111

1� a21σ2 + b12a101 σ2

=a21 + b12a

141 + b

212a

21 + b12a

121 + b12a

101 � b12a121

1� a21σ2

=a21 + b12a

141 + b

212a

21 + b12a

101

1� a21σ2

γj =1+ β12a

121 + b

212 + β12a

12�2j1

1� a21aj1σ

2, 1 � j � 12

γj = a1γj�1, j � 13


Estimation of ARMA models. XIV

This provides autocorrelation coe¢ cients:

ρ1 =γ1γ0=

a1+b12a131 +b212a1+b12a

111

1�a21σ2

1+2b12a121 +b212

1�a21σ2

= a11+ b12a121 + b12a

101 + b

212

1+ 2b12a121 + b212

When 2 � j � 12

ρj =γjγ0=

1+b12a121 +b212+b12a

12�2j1

1�a21aj1σ

2

1+2b12a121 +b212

1�a21σ2

=1+ b12a121 + b

212 + b12a

12�2j1

1+ 2b12a121 + b212

aj1

andρj =

γjγ0=a1γj�1

γ0= a1ρj�1, j � 13


Estimation of ARMA models. XV

So, decaying autocorrelations after lag 12.

Now, the PACF. Starting with the second:

α0 =�

γ1 γ2� � γ0 γ1

γ1 γ0

��1=

hγ0γ1�γ1γ2

γ20�γ21

γ0γ2�γ21γ20�γ21

iThus,

φ22 =γ0γ2 � γ21

γ20 � γ21=

ρ2 � ρ211� ρ21

φ33 =γ1

(γ20 � 2γ21 + γ0γ2)

"(φ11 � a1)

2

(1� a1φ11)

#For this to be 0 we need

ρ2 = ρ21


Estimation of ARMA models. XVI

but that implies

1+b12a121 +b212+b12a

81

1+2b12a121 +b212

a21 = a21

�1+b12a121 +b12a

101 +b

212

1+2b12a121 +b212

�2,

�1+ b12a121 + b

212 + b12a

81

� �1+ 2b12a121 + b

212

�=

�1+ b12a121 + b12a

101 + b

212

�2β12a

81 (a1 � 1)

2 (a1 + 1)2 �b212 + b12a121 + 1� = 0,

b12 = 0_ a1 = 0

Not 0 for this model.

Clearly PACF 2� 11 are NOT 0 for this model.


Estimation of ARMA models. XVIIAnother model candidate would be

yt = a1yt�1 + a2yt�12 + εt

Calculate ACF:

E (ytyt ) = a1E (ytyt�1) + a2E (ytyt�12) + E (yt εt ),γ0 = a1γ1 + a2γ12 + σ2

E (yt�1yt ) = a1E (yt�1yt�1) + a2E (yt�1yt�12) + E (yt�1εt ),γ1 = a1γ0 + a2γ11




Estimation of ARMA models. XVIII

Skip to the 12th:



We have 13 equations with 13 unknows.


Estimation of ARMA models. XIX

Computer or much patience can solve these.

γ0 = a1γ1 + a2γ12 + σ2

γ1 = a1γ0 + a2γ11γ2 = a1γ1 + a2γ10γ3 = a1γ2 + a2γ9γ4 = a1γ3 + a2γ8γ5 = a1γ4 + a2γ7γ6 = a1γ5 + a2γ6γ7 = a1γ6 + a2γ5γ8 = a1γ7 + a2γ4γ9 = a1γ8 + a2γ3

γ10 = a1γ9 + a2γ2γ11 = a1γ10 + a2γ1γ12 = a1γ11 + a2γ0


Estimation of ARMA models. XX

φ11 =γ1γ0

φ22 =γ0γ2 � γ21

γ20 � γ21= 0,

γ0γ2 = γ21 ,σ4a101 a2K

= 0

Again this one cannot be 0 and be compatible with the model inquestion.....


AR(2) example

Now consider data generated by

yt = 0.7yt�1 � 0.49yt�2 + εt

The estimation results are:

Model looks good.


AR(2) example

ACF and PACF for the series

Here there are a few that are too large. ACF 16 and PACF 17.This might have tempted us to estimate a di¤erent model.ACF for residuals is large at 14 and 17Also Ljung-Box(16) is signi�cant.


AR(2) example

Try the following model:

yt = a1yt�1 + a2yt�2 + εt + β16εt�16

Estimation results are:

This looks like a better model!


Parameter Estimation IEstimation of AR models

The simple-minded approach would be to run regression of y on plags of y and use OLS estimates

Let�s consider the properties of this estimator

Use AR(1) as exampleAssume y0 is available

Then:

ρ =∑Tt=1 (yt�1 � yt�1) (yt � yt )

∑Tt=1 (yt�1 � yt�1) (yt�1 � yt�1)

= ρ+∑Tt=1 (yt�1 � yt�1) (εt � εt )

∑Tt=1 (yt�1 � yt�1) (yt�1 � yt�1)

We would like to answer questions about bias, consistency, varianceetc


Parameter Estimation IIEstimation of AR models

We have to regard �regressor�as stochastic.

It is not clear that

E

(∑Tt=1 (yt�1 � yt�1) (εt � εt )

∑Tt=1 (yt�1 � yt�1) (yt�1 � yt�1)

)= 0

We can�t derive explicit expression for bias, but it can be shown that:

The OLS estimate is biased and bias is negative. This bias often calledHurwicz bias � it can be sizeable in small samplesHurwicz bias goes to zero as T ! ∞The OLS estimate is consistentThe OLS estimator is asymptotically normal with usual formulae forasymptotic variance.


Parameter Estimation IIIEstimation of AR models

Note that to estimate an AR(p) model by OLS does not useinformation contained in �rst p observations

This causes a loss of e¢ ciency from this

There are a number of methods which use this information.


Maximum Likelihood Parameter Estimation, AR processes I

Now consider estimation of an AR (1) process using maximumlikelihood.

Assume the errors are normal white noise.

The model is:yt = c + φyt�1 + εt

We wish to estimate c , φ and σ2.

The �rst observation y1 is normally distributed with

E (y1) = µ =c

1� φ

V (y1) =σ2

1� φ2


Maximum Likelihood Parameter Estimation, AR processesII

The density of this observation is

fY1�y1; c , φ, σ2

�=

1q2π σ2

1�φ2

exp

0B@��y1 � c

1�φ

�22 σ2

1�φ2

1CANow consider y2. We know that

y2 = c + φy1 + ε2

We can easily see that

fY2 jY1�y2jy1; c , φ, σ2

�=

1p2πσ2

exp

� (y2 � (c + φy1))

2

2σ2

!


Maximum Likelihood Parameter Estimation, AR processesIII

Recall thatfY1,Y2 = fY2 jY1 � fY1

Similarlyy3 = c + φy2 + ε3

and

fY3 jY1,Y2�y3jy1, y2; c , φ, σ2

�=

1p2πσ2

exp

� (y3 � (c + φy2))

2

2σ2

!

Continuing, we get

fY1,Y2,...,YT�y1, y2, ..., yT ; c , φ, σ

2�= fY1

�y1; c , φ, σ2

� T

∏t=2fYt jYt�1

�yt jyt�1; c , φ, σ2


Maximum Likelihood Parameter Estimation, AR processesIV

Then the log-likelihood is

L�c , φ, σ2

�= �1

2log (2π)� 1

2log�

σ2

1� φ2

��

�y1 � c

1�φ

�22 σ2

1�φ2

�T � 12

log (2π)� T � 12

log�σ2�

�T

∑t=2

(yt � (c + φyt�1))2

2σ2

This one must be maximized numerically.

Another option is to do maximum likelihood conditional on the �rstobservation. This turns out to give the same result as the OLSestimation with the same drawbacks.


Maximum Likelihood Parameter Estimation, MA processesI

We will consider a MA (1) process:

yt = µ+ εt + θεt�1

Assume the errors are normal white noise.

We wish to estimate µ, θ and σ2.

If we knew the value of εt�1, we could write

fYt jεt�1�yt jεt�1; µ, θ, σ2

�=

1p2πσ2

exp

� (yt � µ� θεt�1)

2

2σ2

!

Now, if we knew with certainty that ε0 = 0,

y1 = µ+ ε1 � N�µ, σ2


Maximum Likelihood Parameter Estimation, MA processesII

Note that this trick is similar to assuming the �rst observation wasknown for the AR (1) process.

Also, when y1 is know, so is ε1 = y1 � µ. Therefore

fY2 jY1,ε0=0�y2jy1, ε0 = 0; µ, θ, σ2

�=

1p2πσ2

exp

� (y2 � µ� θε1)

2

2σ2

!

The, sincey2 = µ+ ε2 + θε1

ε2 is also known when ε1 is.


Maximum Likelihood Parameter Estimation, MA processesIII

Continue like this to �nd the density functions.

fYt jYt�1,Yt�2,...,Y1,ε0=0�yt jyt�1, yt�2, ..., y1, ε0 = 0; µ, θ, σ2

�= fYt jεt�1

�yt jεt�1; µ, θ, σ2

�=

1p2πσ2

exp�� ε2t2σ2

�From this we get the log-likelihood function

L�µ, θ, σ2

�= �T

2log (2π)� T

2log�σ2��

T

∑t=2

ε2t2σ2

where εt is calculated recursively from the data.

It is not simple to get the MLE from this expression, so numericalmethods must be used to maximize the likelihood function.Helle Bunzel (ISU) Estimation of ARMA processes February 17, 2009 54 / 55

Maximum Likelihood Parameter Estimation, MA processesIV

It turns out that if jθj < 1, the e¤ect of assuming ε0 = 0 dies out asT becomes big.

There are also many methods for calculating the MLE for MAprocesses without this assumption, but these too are non-trivialnumerical methods.


estimation of arma processes - · pdf fileestimation of arma processes helle bunzel isu...

Documents