# 8.7 ARIMA modelling in R

## How does auto.arima() work ?

The `auto.arima()`

function in R uses a variation of the Hyndman and Khandakar algorithm which
combines unit root tests, minimization of the AICc and MLE to obtain an ARIMA model. The algorithm follows these steps.

#### Hyndman-Khandakar algorithm for automatic ARIMA modelling

The number of differences $d$ is determined using repeated KPSS tests.

The values of $p$ and $q$ are then chosen by minimizing the AICc after differencing the data $d$ times. Rather than considering every possible combination of $p$ and $q$, the algorithm uses a stepwise search to traverse the model space.

(a) The best model (with smallest AICc) is selected from the following four:

ARIMA(2,d,2),

ARIMA(0,d,0),

ARIMA(1,d,0),

ARIMA(0,d,1).If $d=0$ then the constant $c$ is included; if $d\ge1$ then the constant $c$ is set to zero. This is called the "current model".

(b) Variations on the current model are considered:

- vary $p$ and/or $q$ from the current model by $\pm1$;
- include/exclude $c$ from the current model.

The best model considered so far (either the current model, or one of these variations) becomes the new current model.

(c) Repeat Step 2(b) until no lower AICc can be found.

## Choosing your own model

If you want to choose the model yourself, use the `Arima()`

function in
R. For example, to fit the ARIMA(0,0,3) model to the US consumption
data, the following commands can be used.

There is another function `arima()`

in R which also fits an ARIMA model.
However, it does not allow for the constant $c$ unless $d=0$, and it
does not return everything required for the `forecast()`

function.
Finally, it does not allow the estimated model to be applied to new data
(which is useful for checking forecast accuracy). Consequently, it is
recommended that you use `Arima()`

instead.

## Modelling procedure

When fitting an ARIMA model to a set of time series data, the following procedure provides a useful general approach.

- Plot the data. Identify any unusual observations.
- If necessary, transform the data (using a Box-Cox transformation) to stabilize the variance.
- If the data are non-stationary: take first differences of the data until the data are stationary.
- Examine the ACF/PACF: Is an AR($p$) or MA($q$) model appropriate?
- Try your chosen model(s), and use the AICc to search for a better model.
- Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model.
- Once the residuals look like white noise, calculate forecasts.

The automated algorithm only takes care of steps 3–5. So even if you use it, you will still need to take care of the other steps yourself.

The process is summarized in Figure 8.10.

## Example 8.2 Seasonally adjusted electrical equipment orders

We will apply this procedure to the seasonally adjusted electrical equipment orders data shown in Figure 8.11.

plot(eeadj)

- The time plot shows some sudden changes, particularly the big drop in 2008/2009. These changes are due to the global economic environment. Otherwise there is nothing unusual about the time plot and there appears to be no need to do any data adjustments.
- There is no evidence of changing variance, so we will not do a Box-Cox transformation.
- The data are clearly non-stationary as the series wanders up and
down for long periods. Consequently, we will take a first difference
of the data. The differenced data are shown in Figure 8.12. These
look stationary, and so we will not consider further differences.
R codetsdisplay(diff(eeadj),main="")
- The PACF shown in Figure 8.12 is suggestive of an AR(3) model. So an initial candidate model is an ARIMA(3,1,0). There are no other obvious candidate models.
- We fit an ARIMA(3,1,0) model along with variations including
ARIMA(4,1,0), ARIMA(2,1,0), ARIMA(3,1,1), etc. Of these, the
ARIMA(3,1,1) has a slightly smaller AICc value.
R output> fit <- Arima(eeadj, order=c(3,1,1))

> summary(fit)

Series: eeadj

ARIMA(3,1,1)

Coefficients:

ar1 ar2 ar3 ma1

0.0519 0.1191 0.3730 -0.4542

s.e. 0.1840 0.0888 0.0679 0.1993

sigma^2 estimated as 9.532: log likelihood=-484.08

AIC=978.17 AICc=978.49 BIC=994.4 - The ACF plot of the residuals from the ARIMA(3,1,1) model shows all
correlations within the threshold limits indicating that the
residuals are behaving like white noise. A portmanteau test returns
a large p-value, also suggesting the residuals are white noise.
R codeAcf(residuals(fit))

Box.test(residuals(fit), lag=24, fitdf=4, type="Ljung") - Forecasts from the chosen model are shown in Figure 8.13.

If we had used the automated algorithm instead, we would have obtained exactly the same model in this example.

## Understanding constants in R

A non-seasonal ARIMA model can be written as

\begin{equation}\label{eq:c}\tag{8.2} (1-\phi_1B - \cdots - \phi_p B^p)(1-B)^d y_t = c + (1 + \theta_1 B + \cdots + \theta_q B^q)e_t \qquad\qquad \end{equation}

or equivalently as

\begin{equation}\label{eq:mu}\tag{8.3} (1-\phi_1B - \cdots - \phi_p B^p)(1-B)^d (y_t - \mu t^d/d!) = (1 + \theta_1 B + \cdots + \theta_q B^q)e_t, \qquad\qquad \end{equation}

where $c = \mu(1-\phi_1 - \cdots - \phi_p )$ and $\mu$ is the mean of $(1-B)^d y_t$. R uses the parametrization of equation (\ref{eq:mu}). Thus, the inclusion of a constant in a non-stationary ARIMA model is equivalent to inducing a polynomial trend of order $d$ in the forecast function. (If the constant is omitted, the forecast function includes a polynomial trend of order $d-1$.) When $d=0$, we have the special case that $\mu$ is the mean of $y_t$.

### arima()

By default, the `arima()`

command in R sets $c=\mu=0$ when $d>0$
and provides an estimate of $\mu$ when $d=0$. The parameter
$\mu$ is called the “intercept” in the R output. It will be close to
the sample mean of the time series, but usually not identical to it as
the sample mean is not the maximum likelihood estimate when $p+q>0$.
The `arima()`

command has an argument `include.mean`

which only has an
effect when $d=0$ and is `TRUE`

by default. Setting
`include.mean=FALSE`

will force $\mu=0$.

### Arima()

The `Arima()`

command from the forecast package provides more
flexibility on the inclusion of a constant. It has an argument
`include.mean`

which has identical functionality to the corresponding
argument for `arima()`

. It also has an argument `include.drift`

which
allows $\mu\ne0$ when $d=1$. For $d>1$, no constant is allowed
as a quadratic or higher order trend is particularly dangerous when
forecasting. The parameter $\mu$ is called the “drift” in the R
output when $d=1$.

There is also an argument `include.constant`

which,
if `TRUE`

, will set `include.mean=TRUE`

if $d=0$ and
`include.drift=TRUE`

when $d=1$. If `include.constant=FALSE`

, both
`include.mean`

and `include.drift`

will be set to `FALSE`

. If
`include.constant`

is used, the values of `include.mean=TRUE`

and
`include.drift=TRUE`

are ignored.

### auto.arima()

The `auto.arima()`

function automates the inclusion of a constant. By
default, for $d=0$ or $d=1$, a constant will be included if it
improves the AIC value; for $d>1$ the constant is always omitted. If
`allowdrift=FALSE`

is specified, then the constant is only allowed when
$d=0$.