8.5 Nonseasonal ARIMA models
If we combine differencing with autoregression and a moving average model, we obtain a nonseasonal ARIMA model. ARIMA is an acronym for AutoRegressive Integrated Moving Average model (“integration” in this context is the reverse of differencing). The full model can be written as
where $y'_{t}$ is the differenced series (it may have been differenced more than once). The “predictors” on the right hand side include both lagged values of $y_t$ and lagged errors. We call this an ARIMA($p, d, q$) model, where

$p =$ order of the autoregressive part;
$d =$ degree of first differencing involved;
$q =$ order of the moving average part.
The same stationarity and invertibility conditions that are used for autoregressive and moving average models apply to this ARIMA model.
Once we start combining components in this way to form more complicated models, it is much easier to work with the backshift notation. Then equation (\ref{eq8arima}) can be written as
Selecting appropriate values for $p$, $d$ and $q$ can be difficult. The
auto.arima()
function in R will do it for you automatically. Later in
this chapter, we will learn how the function works, and some methods for
choosing these values yourself.
Many of the models we have already discussed are special cases of the ARIMA model as shown in the following table.
White noise  ARIMA(0,0,0) 
Random walk  ARIMA(0,1,0) with no constant 
Random walk with drift  ARIMA(0,1,0) with a constant 
Autoregression  ARIMA(p,0,0) 
Moving average  ARIMA(0,0,q) 
Example 8.1 US personal consumption
Figure 8.7 shows quarterly percentage changes in US consumption expenditure. Although it is quarterly data, there appears to be no seasonal pattern, so we will fit a nonseasonal ARIMA model.
The following R code was used to automatically select a model.
ARIMA(0,0,3) with nonzero mean
Coefficients:
ma1 ma2 ma3 intercept
0.2542 0.2260 0.2695 0.7562
s.e. 0.0767 0.0779 0.0692 0.0844
sigma^2 estimated as 0.3856: log likelihood=154.73
AIC=319.46 AICc=319.84 BIC=334.96
This is an ARIMA(0,0,3) or MA(3) model: $$y_t = 0.756 + e_t + 0.254 e_{t1} + 0.226 e_{t2} + 0.269 e_{t3}, $$ where $e_t$ is white noise with standard deviation $0.62 = \sqrt{0.3856}$. Forecasts from the model are shown in Figure 8.8.
Understanding ARIMA models
The auto.arima()
function is very useful, but anything automated can
be a little dangerous, and it is worth understanding something of the
behaviour of the models even when you rely on an automatic procedure to
choose the model for you. The constant $c$ has an important effect on
the longterm forecasts obtained from these models.
 If $c=0$ and $d=0$, the longterm forecasts will go to zero.
 If $c=0$ and $d=1$, the longterm forecasts will go to a nonzero constant.
 If $c=0$ and $d=2$, the longterm forecasts will follow a straight line.
 If $c\ne0$ and $d=0$, the longterm forecasts will go to the mean of the data.
 If $c\ne0$ and $d=1$, the longterm forecasts will follow a straight line.
 If $c\ne0$ and $d=2$, the longterm forecasts will follow a quadratic trend.
The value of $d$ also has an effect on the prediction intervals — the higher the value of $d$, the more rapidly the prediction intervals increase in size. For $d=0$, the longterm forecast standard deviation will go to the standard deviation of the historical data, so the prediction intervals will all be essentially the same.
This behaviour is seen in Figure 8.8 where $d=0$ and $c\ne 0$. In this figure, the prediction intervals are the same for the last few forecast horizons, and the point forecasts are equal to the mean of the data.
The value of $p$ is important if the data show cycles. To obtain cyclic forecasts, it is necessary to have $p\ge2$ along with some additional conditions on the parameters. For an AR(2) model, cyclic behaviour occurs if $\phi_1^2+4\phi_2<0$. In that case, the average period of the cycles is ^{1} $$\frac{2\pi}{\text{arc cos}(\phi_1(1\phi_2)/(4\phi_2))}. $$
ACF and PACF plots
It is usually not possible to tell, simply from a time plot, what values of $p$ and $q$ are appropriate for the data. However, it is sometimes possible to use the ACF plot, and the closely related PACF plot, to determine appropriate values for $p$ and $q$.
Recall that an ACF plot shows the autocorrelations which measure the relationship between $y_t$ and $y_{tk}$ for different values of $k$. Now if $y_t$ and $y_{t1}$ are correlated, then $y_{t1}$ and $y_{t2}$ must also be correlated. But then $y_t$ and $y_{t2}$ might be correlated, simply because they are both connected to $y_{t1}$, rather than because of any new information contained in $y_{t2}$ that could be used in forecasting $y_t$.
To overcome this problem, we can use partial autocorrelations. These measure the {relationship} between $y_{t}$ and $y_{tk}$ after removing the effects of other time lags  $1, 2, 3, \dots, k  1$. So the first partial autocorrelation is identical to the first autocorrelation, because there is nothing between them to remove. The partial autocorrelations for lags 2, 3 and greater are calculated as follows:
\begin{align*} \alpha_k&= \text{$k$th partial autocorrelation coefficient} \\ &= \text{the estimate of $\phi_k$ in the autoregression model} \\ & y_t = c + \phi_1 y_{t1} + \phi_2 y_{t2} + \dots + \phi_k y_{tk} + e_t. \end{align*}
Varying the number of terms on the right hand side of this autoregression model gives $\alpha_k$ for different values of $k$. (In practice, there are more efficient algorithms for computing $\alpha_k$ than fitting all these autoregressions, but they give the same results.)
Figure 8.9 shows the ACF and PACF plots for the US consumption data shown in Figure 8.7.
The partial autocorrelations have the same critical values of $\pm 1.96/\sqrt{T}$ as for ordinary autocorrelations, and these are typically shown on the plot as in Figure 8.9.
Acf(usconsumption[,1],main="")
Pacf(usconsumption[,1],main="")
If the data are from an ARIMA($p,d,0$) or ARIMA($0,d,q$) model, then the ACF and PACF plots can be helpful in determining the value of $p$ or $q$. If both $p$ and $q$ are positive, then the plots do not help in finding suitable values of $p$ and $q$.
The data may follow an ARIMA($p,d,0$) model if the ACF and PACF plots of the differenced data show the following patterns:
 the ACF is exponentially decaying or sinusoidal;
 there is a significant spike at lag $p$ in PACF, but none beyond lag $p$.
The data may follow an ARIMA($0,d,q$) model if the ACF and PACF plots of the differenced data show the following patterns:
 the PACF is exponentially decaying or sinusoidal;
 there is a significant spike at lag $q$ in ACF, but none beyond lag $q$.
In Figure 8.9, we see that there are three spikes in the ACF and then no significant spikes thereafter (apart from one just outside the bounds at lag 14). In the PACF, there are three spikes decreasing with the lag, and then no significant spikes thereafter (apart from one just outside the bounds at lag 8). We can ignore one significant spike in each plot if it is just outside the limits, and not in the first few lags. After all, the probability of a spike being significant by chance is about one in twenty, and we are plotting 21 spikes in each plot. The pattern in the first three spikes is what we would expect from an ARIMA(0,0,3) as the PACF tends to decay exponentially. So in this case, the ACF and PACF lead us to the same model as was obtained using the automatic procedure.

arc cos is the inverse cosine function. You should be able to find it on your calculator. It may be labelled acos or cos$^{1}$. ↩