Machine Learning is Fun – Part 12 – ARIMA

Viewer Rating

What is ARIMA?

Actually, it is a combination of two models – AR and MA.

AR – Auto Regressive.

MA – Moving Average.

Lots of jargons? Let me make it clear.

Auto Regressive means when there is some correlation between values in a time series and the values that precede and succeed them.

yt = δ + φ1yt-1 + φ2yt-2 + … + φpyt-1
Where: yt-1, yt-2…yt-p are the past series values (lags).

In a model, the value of AR determined by partial auto correlation which is the partial autocorrelation function gives the partial correlation of a time series with its own lagged values.

MA or moving average denoted by

X_{t}=\mu +\varepsilon _{t}+\theta _{1}\varepsilon _{{t-1}}+\cdots +\theta _{q}\varepsilon _{{t-q}}\,

where μ is the mean of the series, the θ1, …, θq are the parameters of the model and the εt, εt−1,…, εt−q are white noise error terms.

A moving-average model is conceptually a linear regression of the current value of the series against current and previous (observed) white noise error terms.

So I think you can now differentiate between AR and MA by their equations – right?

Now ‘I’ is the factor means integrated which is the value of derivative term to convert the actual data into stationary by deferentiation technique in calculus.

There are 3 parameters in ARIMA model and they are q, p and d.

Now, the value of auto-correlation is related to value of q.

The value of partial auto-correlation is related to value of p.

And, as mentioned in my previous post, the value of differentiation is related to the value of d.

How to calculate the value of p and q?

ACF stands for auto-correlation function. The value of q means the previous order number of the first inverted line.

As you can see in the above picture, the order of first inverted line is 2 (the counting order starts from 0). Then the value of q will be 1.

Similarly, you can do the same for the value of p for the plot of PACF.

Next, you can use the function named Auto.Arima() to get the value of p and q automatically.

But I shall recommend to use the arima() function because, for analytical purposes, time-series concept never runs on any strict rule. You always need some kind of trial and error steps to tweak the value of p and q. The key factor is that we always need to check the diagnostic factors for AIC and BIC. The best models tends to the lowest value of AIC and BIC.

That’s all for today. Stay tuned. ☺

3 thoughts on “Machine Learning is Fun – Part 12 – ARIMA”

Leave a Comment

%d bloggers like this: