2.3 Some simple forecasting methods

Some forecasting methods are very simple and surprisingly effective. Here are four methods that we will use as benchmarks for other forecasting methods.

Average method

Here, the forecasts of all future values are equal to the mean of the historical data. If we let the historical data be denoted by $y_{1},\dots,y_{T}$, then we can write the forecasts as

$$ \hat{y}_{T+h|T} = \bar{y} = (y_{1}+\dots+y_{T})/T. $$

The notation $\hat{y}_{T+h|T}$ is a short-hand for the estimate of $y_{T+h}$ based on the data $y_1,\dots,y_T$.

Although we have used time series notation here, this method can also be used for cross-sectional data (when we are predicting a value not included in the data set). Then the prediction for values not observed is the average of those values that have been observed. The remaining methods in this section are only applicable to time series data.

R code
meanf(y, h)
# y contains the time series
# h is the forecast horizon

Naïve method

This method is only appropriate for time series data. All forecasts are simply set to be the value of the last observation. That is, the forecasts of all future values are set to be $y_{T}$, where $y_T$ is the last observed value. This method works remarkably well for many economic and financial time series.

R code
naive(y, h)
rwf(y, h) # Alternative

Seasonal naïve method

A similar method is useful for highly seasonal data. In this case, we set each forecast to be equal to the last observed value from the same season of the year (e.g., the same month of the previous year). Formally, the forecast for time $T+h$ is written as

[ y_{T+h-km} \text{ where $m=$ seasonal period, $k=\lfloor (h-1)/m\rfloor+1$,} ]

and $\lfloor u \rfloor$ denotes the integer part of $u$. That looks more complicated than it really is. For example, with monthly data, the forecast for all future February values is equal to the last observed February value. With quarterly data, the forecast of all future Q2 values is equal to the last observed Q2 value (where Q2 means the second quarter). Similar rules apply for other months and quarters, and for other seasonal periods.

R code
snaive(y, h)

Drift method

A variation on the naïve method is to allow the forecasts to increase or decrease over time, where the amount of change over time (called the drift) is set to be the average change seen in the historical data. So the forecast for time $T+h$ is given by

[ y_{T} + \frac{h}{T-1}\sum_{t=2}^T (y_{t}-y_{t-1}) = y_{T} + h \left( \frac{y_{T} -y_{1}}{T-1}\right). ]

This is equivalent to drawing a line between the first and last observation, and extrapolating it into the future.

R code
rwf(y, h, drift=TRUE)


Figure 2.13 shows the first three methods applied to the quarterly beer production data.

Figure 2.13: Fore­casts of Aus­tralian quar­terly beer production.

R code
beer2 <- window(ausbeer,start=1992,end=2006-.1)
beerfit1 <- meanf(beer2, h=11)
beerfit2 <- naive(beer2, h=11)
beerfit3 <- snaive(beer2, h=11)

plot(beerfit1, plot.conf=FALSE,
  main="Forecasts for quarterly beer production")
  legend=c("Mean method","Naive method","Seasonal naive method"))

In Figure 2.14, the non-seasonal methods were applied to a series of 250 days of the Dow Jones Index.

Figure 2.14: Fore­casts based on 250 days of the Dow Jones Index.

R code
dj2 <- window(dj,end=250)
plot(dj2,main="Dow Jones Index (daily ending 15 Jul 94)",
  legend=c("Mean method","Naive method","Drift method"))

Sometimes one of these simple methods will be the best forecasting method available. But in many cases, these methods will serve as benchmarks rather than the method of choice. That is, whatever forecasting methods we develop, they will be compared to these simple methods to ensure that the new method is better than these simple alternatives. If not, the new method is not worth considering.