# 9.2 Vector autoregressions

One limitation with the models we have considered so far is that they impose a unidirectional relationship — the forecast variable is influenced by the predictor variables, but not vice versa. However, there are many cases where the reverse should also be allowed for -- where all variables affect each other. Consider the series in Example 9.1. The changes in personal consumption expenditure ($C_ t$) are forecast based on the changes in personal disposable income ($I_ t$). In this case a bi-directional relationship may be more suitable: an increase in $I_ t$ will lead to an increase in $C_ t$ and vice versa.

An example of such a situation occurred in Australia during the Global Financial Crisis of 2008–2009. The Australian government issued stimulus packages that included cash payments in December 2008, just in time for Christmas spending. As a result, retailers reported strong sales and the economy was stimulated. Consequently, incomes increased.

Such feedback relationships are allowed for in the vector autoregressive (VAR) framework. In this framework, all variables are treated symmetrically. They are all modelled as if they influence each other equally. In more formal terminology, all variables are now treated as “endogenous”. To signify this we now change the notation and write all variables as $y$s: $y_{1,t}$ denotes the $t$th observation of variable $y_1$, $y_{2,t}$ denotes the $t$th observation of variable $y_2$, and so on.

A VAR model is a generalisation of the univariate autoregressive model for forecasting a collection of variables; that is, a vector of time series.1 It comprises one equation per variable considered in the system. The right hand side of each equation includes a constant and lags of all the variables in the system. To keep it simple, we will consider a two variable VAR with one lag. We write a 2-dimensional VAR(1) as

\begin{align*} y_{1,t} &= c_1+\phi_{11,1}y_{1,t-1}+\phi_{12,1}y_{2,t-1}+e_{1,t} \label{eq91a} \tag{9.1a}\\ y_{2,t} &= c_2+\phi_{21,1}y_{1,t-1}+\phi_{22,1}y_{2,t-1}+e_{2,t} \label{eq91b} \tag{9.1b} \end{align*}

where $e_{1,t}$ and $e_{2,t}$ are white noise processes that may be contemporaneously correlated. Coefficient $\phi_{ii,\ell }$ captures the influence of the $\ell$th lag of variable $y_i$ on itself, while coefficient $\phi_{ij,\ell }$ captures the influence of the $\ell$th lag of variable $y_j$ on $y_i$.

If the series modelled are stationary we forecast them by directly fitting a VAR to the data (known as a “VAR in levels”). If the series are non-stationary we take differences to make them stationary and then we fit a VAR model (known as a “VAR in differences”). In both cases, the models are estimated equation by equation using the principle of least squares. For each equation, the parameters are estimated by minimising the sum of squared $e_{i,t}$ values.

The other possibility which is beyond the scope of this book and therefore we do not explore here, is that series may be non-stationary but they are cointegrated, which means that there exists a linear combination of them that is stationary. In this case a VAR specification that includes an error correction mechanism (usually referred to as a vector error correction model) should be included and alternative estimation methods to least squares estimation should be used.2

Forecasts are generated from a VAR in a recursive manner. The VAR generates forecasts for each variable included in the system. To illustrate the process, assume that we have fitted the 2-dimensional VAR(1) described in equations \eqref{eq91a}–\eqref{eq91b} for all observations up to time $T$. Then the one-step-ahead forecasts are generated by

\begin{align*} \hat y_{1,T+1|T} & =\hat{c}_1+\hat\phi _{11,1}y_{1,T}+\hat\phi _{12,1}y_{2,T} \\ \hat y_{2,T+1|T} & =\hat{c}_2+\hat\phi _{21,1}y_{1,T}+\hat\phi _{22,1}y_{2,T}. \end{align*}

This is the same form as \eqref{eq91a}–\eqref{eq91b} except that the errors have been set to zero and parameters have been replaced with their estimates. For $h=2$, the forecasts are given by

\begin{align*} \hat y_{1,T+2|T} & =\hat{c}_1+\hat\phi_{11,1}\hat y_{1,T+1}+\hat\phi _{12,1}\hat y_{2,T+1}\\ \hat y_{2,T+2|T}& =\hat{c}_2+\hat\phi _{21,1}\hat y_{1,T+1}+\hat\phi _{22,1}\hat y_{2,T+1}. \end{align*}

Again, this is the same form as \eqref{eq91a}–\eqref{eq91b} except that the errors have been set to zero, parameters have been replaced with their estimates, and the unknown values of $y_1$ and $y_2$ have been replaced with their forecasts. The process can be iterated in this manner for all future time periods.

There are two decisions one has to make when using a VAR to forecast. They are, how many variables (denoted by $K$) and how many lags (denoted by $p$) should be included in the system. The number of coefficients to be estimated in a VAR is equal to $K+pK^2$ (or $1+pK$ per equation). For example, for a VAR with $K=5$ variables and $p=3$ lags, there are 16 coefficients per equation making for a total of 80 coefficients to be estimated. The more coefficients to be estimated the larger the estimation error entering the forecast.

In practice it is usual to keep $K$ small and include only variables that are correlated to each other and therefore useful in forecasting each other. Information criteria are commonly used to select the number of lags to be included.

VARs are implemented in the vars package in R. It contains a function VARselect to choose the number of lags $p$ using four different information criteria: AIC, HQ, SC and FPE. We have met the AIC before, and SC is simply another name for the BIC (SC stands for Schwarz Criterion after Gideon Schwarz who proposed it). HQ is the Hannan-Quinn criterion and FPE is the “Final Prediction Error” criterion.3 Care should be taken using the AIC as it tends to choose large numbers of lags. Instead, for VAR models, we prefer to use the BIC.

A criticism VARs face is that they are atheoretical. They are not built on some economic theory that imposes a theoretical structure to the equations. Every variable is assumed to influence every other variable in the system, which makes direct interpretation of the estimated coefficients very difficult. Despite this, VARs are useful in several contexts:

1. forecasting a collection of related variables where no explicit interpretation is required;
2. testing whether one variable is useful in forecasting another (the basis of Granger causality tests);
3. impulse response analysis, where the response of one variable to a sudden but temporary change in another variable is analysed;
4. forecast error variance decomposition, where the proportion of the forecast variance of one variable is attributed to the effect of other variables.

## Example 9.4 A VAR model for forecasting US consumption

The R output below shows the lag length selected by each of the information criteria available in the vars package. There is a large discrepancy between a VAR(5) selected by the AIC and a VAR(1) selected by the BIC. This is not unusual. As a result we first fit a VAR(1), selected by the BIC. In similar fashion to the univariate ARIMA methodology we test that the residuals are uncorrelated using a Portmanteau test.4 The null hypothesis of no serial correlation in the residuals is rejected for both a VAR(1) and a VAR(2) and therefore we fit a VAR(3) as now the null is not rejected. The forecasts generated by the VAR(3) are plotted in Figure 9.8.

R output
> library(vars)
> VARselect(usconsumption, lag.max=8, type="const")\$selection
AIC(n)  HQ(n)  SC(n) FPE(n)
5      1      1      5
> var <- VAR(usconsumption, p=3, type="const")
> serial.test(var, lags.pt=10, type="PT.asymptotic")
Portmanteau Test (asymptotic)
data:  Residuals of VAR object var
Chi-squared = 33.3837, df = 28, p-value = 0.2219

> summary(var)
VAR Estimation Results:
=========================
Endogenous variables: consumption, income
Deterministic variables: const
Sample size: 161

Estimation results for equation consumption:
============================================
Estimate Std. Error t value Pr(>|t|)
consumption.l1  0.22280    0.08580   2.597 0.010326 *
income.l1       0.04037    0.06230   0.648 0.518003
consumption.l2  0.20142    0.09000   2.238 0.026650 *
income.l2      -0.09830    0.06411  -1.533 0.127267
consumption.l3  0.23512    0.08824   2.665 0.008530 **
income.l3      -0.02416    0.06139  -0.394 0.694427
const           0.31972    0.09119   3.506 0.000596 ***

Estimation results for equation income:
=======================================
Estimate Std. Error t value Pr(>|t|)
consumption.l1  0.48705    0.11637   4.186 4.77e-05 ***
income.l1      -0.24881    0.08450  -2.945 0.003736 **
consumption.l2  0.03222    0.12206   0.264 0.792135
income.l2      -0.11112    0.08695  -1.278 0.203170
consumption.l3  0.40297    0.11967   3.367 0.000959 ***
income.l3      -0.09150    0.08326  -1.099 0.273484
const           0.36280    0.12368   2.933 0.003865 **
---
Signif. codes:  0***0.001**0.01*0.05 ‘.’ 0.1 ‘ ’ 1

Correlation matrix of residuals:
consumption income
consumption      1.0000 0.3639
income           0.3639 1.0000

Figure 9.8: Forecasts for US consumption and income generated from a VAR(3).

R code
fcst <- forecast(var)
plot(fcst, xlab="Year")

1. A more flexible generalisation would be a Vector ARMA process, however issues involved with the identification and estimation of such a process are beyond the scope of this book. In fact the simplicity of using VARs in contrast to VARMAs is generally considered a great advantage and has led to the dominance in using VARs for forecasting. Interested readers may refer to G. Athanasopoulos, D. S. Poskitt and F. Vahid (2012). Two Canonical VARMA Forms: Scalar Component Models Vis-à-Vis the Echelon Form. Econometric Reviews 31(1), 60–83.

2. For a detailed comparison of these criteria, see Chapter 4.3 of H. Lütkepohl (2005). New introduction to multiple time series analysis. Berlin: Springer-Verlag.

3. The tests for serial correlation in the “vars” package are multivariate generalisations of the tests presented in Section 2/6