arima model

24
ARIMA MODEL Time Series Analysis & Forecasting

Upload: jassika

Post on 12-Apr-2017

139 views

Category:

Engineering


1 download

TRANSCRIPT

Page 1: Arima model

ARIMA MODEL Time Series Analysis

& Forecasting

Page 2: Arima model

ContentsIntroduction to ARIMA• AssumptionsARIMA ModelsPros & Cons Procedure for ARIMA Modeling (Box Jenkins Approach)

Page 3: Arima model

Introduction To ARIMAAcronym for Auto Regressive Integrated

Moving Average It is a prediction model used for time series (time series is a collection of observations of

well-defined data items obtained through repeated measurements over time)analysis & forecasting.

Ex: measuring the level of unemployment each month of the year would comprise a time series.

Page 4: Arima model

A time series can also show the impact of cyclical, seasonal and irregular events on the data item being measured.

Here the terms are: Auto Regressive : lags of variables itself Integrated :Differencing steps required to

make stationary Moving Average :lags of previous

information shocks

Page 5: Arima model

A non seasonal ARIMA model is classified as an "ARIMA(p , d , q)" model, where:

p is the number of autoregressive terms, d is the number of non seasonal differences

needed for stationarity, and q is the number of lagged forecast errors in the

prediction equation.

Page 6: Arima model

AssumptionsThe data series used by ARIMA should be

stationary-by stationary it means that the properties of the series doesn’t depend on the time when it is captured. A white noise series and series with cyclic behavior can also be considered as stationary series.

A non stationary series is made stationary by differencing.

Page 7: Arima model

Data should be univariate - ARIMA works on a single variable. Auto-regression is all about regression with the past values.

Page 8: Arima model

ARIMA ModelsAuto Regressive (AR) Model:

Value of a variable in one period is related to the values in previous period.

AR(p) - Current values depend on its own p-previous values

P is the order of AR processEx : AR(1,0,0) or AR(1)

Moving Average (MA) Model: Accounts for possibility of a relationship

b/w a variable & residuals from previous period.

Page 9: Arima model

MA(q) - The current deviation from mean depends on q- previous deviations

q is the order of MA processOnly error terms are thereEx: MA(0,0,1) or MA(1)

ARMA Model: both AR and MA are there,i.e, ARMA(1,0,1) or ARMA(1,1)

ARIMA Model : if differencing term is also included ,i.e, ARIMA(1,1,1)=ARMA(1,1) with first differencing

ARIMAX: if some exogenous variables are also included.

Page 10: Arima model

ARIMA+X=ARIMAXARIMA with environmental variable is very important in the case when external variable start impacting the seriesEx. Flight delay prediction depends not only historical time series data but external variables like weather condition (temperature , pressure, humidity, visibility, arrival of other flights, weighting time etc.)

Page 11: Arima model

Pros & ConsPros :1.Better understand the time series patterns2.Forecasting based on ARIMACons : Captures only linear relationships ,

hence , a neural network model or genetic model could be used if a non linear associations(ex: quadratic relation) is found in the variables.

Page 12: Arima model

Procedure for ARIMA Modeling• Ensure Stationarity :Determine the appropriate values of

d .• Make Correlograms (ACF & PACF): PACF indicate the AR

terms & ACF will show the MA terms.• Fit the model :Estimate an ARIMA model using values of

p, d, & q you think are appropriate.• Diagnostic Test : Check residuals of estimated ARIMA

model ; pick best model with well behaved residuals.• Forecasting : use the fitted model for forecasting

purpose.

Page 13: Arima model

The Box-Jenkins Approach1.Differencing the series to

achieve stationary

2.Identify the model

3.Estimate the parameters of the

model

Diagnostic checking. Is the model adequate?

No

Yes4. Use Model for forecasting

Page 14: Arima model

Step-1: StationarityIn order to model a time series with the Box-Jenkins

approach, the series has to be stationary.If the process is non-stationary then first differences

of the series are computed to determine if that operation results in a stationary series.

The process is continued until a stationary time series is found.

This then determines the value of d.

Page 15: Arima model

Testing StationarityDickey-Fuller test

P value has to be less than 0.05 or 5%If p value is greater than 0.05 or 5%, you

accept the null hypothesis, you conclude that the time series has a unit root.

In that case, you should first difference the series before proceeding with analysis.

Page 16: Arima model

What DF test ?Imagine a series where a fraction of the

current value is depending on a fraction of previous value of the series.

DF builds a regression line between fraction of the current value Δyt and fraction of previous value δyt-1

The usual t-statistic is not valid, thus D-F developed appropriate critical values. If P value of DF test is <5% then the series is stationary

Page 17: Arima model

Step-2:Making CorrelogramsAutoCorrelation Function (ACF):it is a

correlation coefficient. However, instead of correlation between two different variables, the correlation is between two values of the same variable at times Xi and Xi+k.

Correlation with lag-1, lag2, lag3 etc.,The ACF represents the degree of persistence

over respective lags of a variable.

Page 18: Arima model

ACF Graph-0

.50

0.00

0.50

1.00

Aut

ocor

rela

tions

of p

resa

p

0 10 20 30 40Lag

Bartlett's formula for MA(q) 95% confidence bands

Page 19: Arima model

Partial Autocorrelation Function (PACF):The exclusive correlation coefficientthe "partial" correlation between two variables is the

amount of correlation between them which is not explained by their mutual correlations with a specified set of other variables.

For example, if we are regressing a variable Y on other variables X1, X2, and X3, the partial correlation between Y and X3 is the amount of correlation between Y and X3 that is not explained by their common correlations with X1 and X2.

Partial correlation measures the degree of association  between two random variables, with the effect of a set of controlling random variables removed.

Page 20: Arima model

PACF Graph-0

.50

0.00

0.50

1.00

Par

tial a

utoc

orre

latio

ns o

f pre

sap

0 10 20 30 40Lag

95% Confidence bands [se = 1/sqrt(n)]

Page 21: Arima model

Fit the Model

Fit model based on AR & MA terms.Make use of auto.arima(x) function ,where x is

data series. It will do various combination of AR & MA terms and find the best model based on lowest AIC(Acyle Information Criteria ).

For fitting model use arima(x,order=c(p,d,q)) function.Ex: fit=arima(x,order=c(4,0,2)).

Order=c(p,d,q) is model received from auto.arima(x) function.

Page 22: Arima model

Diagnostic TestFirst find the residuals: use residuals(model)

function.Ex: fit_resid=residuals(fit).Now do diagnostic on all these residuals(A

residual in forecasting is the difference between an observed value and its forecast based on other observations: ei=yi−y^i. For time series forecasting, a residual is based on one-step forecasts; that is y^t is the forecast of yt based on observations y1,…,yt−1.).

If residuals are IID(i.e, having no auto correlation ) then model is fit..

Page 23: Arima model

For diagnostic use different tests ,ex,Ljung Box test.Make use of Box.test() function to find p.

Ex:Box.test(fit_resid,lag=10,type=“Ljung-Box”)

If p-value is non zero then no serial correlation is there & model is fit & can be used for forecasting purpose

Page 24: Arima model

Thanks