7.4 Damped trend methods

The forecasts generated by Holt’s linear method display a constant trend (increasing or decreasing) indefinitely into the future. Even more extreme are the forecasts generated by the exponential trend method which include exponential growth or decline.

Empirical evidence indicates that these methods tend to over-forecast, especially for longer forecast horizons. Motivated by this observation, Gardner and McKenzie (1985) introduced a parameter that “dampens” the trend to a flat line some time in the future. Methods that include a damped trend have proven to be very successful and are arguably the most popular individual methods when forecasts are required automatically for many series.

Additive damped trend

In conjunction with the smoothing parameters $\alpha$ and $\beta^*$ (with values between 0 and 1 as in Holt’s method), this method also includes a damping parameter $0<\phi<1$ :

\begin{align*} \pred{y}{t+h}{t} &= \ell_{t} + (\phi+\phi^2 + \dots + \phi^{h}) b_{t} \\ \ell_{t} &= \alpha y_{t} + (1 - \alpha)(\ell_{t-1} + \phi b_{t-1})\\ b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 -\beta^*)\phi b_{t-1}. \end{align*}

If $\phi=1$ the method is identical to Holt’s linear method. For values between $0$ and $1$, $\phi$ dampens the trend so that it approaches a constant some time in the future. In fact the forecasts converge to $\ell_T+\phi b_T/(1-\phi)$ as $h\rightarrow\infty$ for any value $0<\phi<1$. The effect of this is that short-run forecasts are trended while long-run forecasts are constant.

The error correction form of the smoothing equations is

\begin{align*} \ell_{t} &= \ell_{t-1} + \phi b_{t-1} + \alpha e_{t} \\ b_{t} &= \phi b_{t-1}+ \alpha \beta^*e_{t}. \end{align*}

Example 7.2    Air Passengers (continued)

Figure 7.3 shows the one-step within-sample forecasts, and the forecasts for years 2005–2010 generated from Holt’s linear trend method, exponential trend and additive damped trend. The most optimistic forecasts come from the exponential trend method while the least optimistic by the damped trend method, with the forecasts generated by Holt’s linear trend method somewhere between the two.

Multiplicative damped trend

Motivated by the improvements in forecasting performance seen in the additive damped trend case, Taylor (2003) introduced a damping parameter to the exponential trend method resulting to a multiplicative damped trend method:

\begin{align*} \pred{y}{t+h}{t} &= \ell_{t}b_{t}^{(\phi+\phi^2 + \dots + \phi^{h})} \\ \ell_{t} &= \alpha y_{t} + (1 - \alpha)\ell_{t-1} b^\phi_{t-1}\\ b_{t} &= \beta^*\frac{\ell_{t}}{ \ell_{t-1}} + (1 -\beta^*)b_{t-1}^{\phi}. \end{align*}

This method will produce less conservative forecasts than the additive damped trend method when compared to Holt’s linear method. The error correction form of the smoothing equations is

\begin{align*} \ell_{t} &= \ell_{t-1} b^\phi_{t-1}+\alpha e_{t}\\ b_{t} &= b^\phi_{t-1}+ \alpha \beta^*\frac{e_{t}}{\ell_{t-1}}. \end{align*}

Example 7.3 Sheep in Asia

In this example we compare the forecasting performance of all the non-seasonal methods we have considered so far in forecasting the sheep livestock population in Asia. The data spans the period 1970–2005. We withhold the period 2001–2005 as a test set, and use the data up to and including year 2000 for the training set (see Section 2.5 for a definition of training and test sets). Figure 7.5 shows that data and the forecasts from all methods.

The parameters and initial values of the methods are estimated for all methods by minimizing SSE (as specified in Equation (7.3)) over the training set. In Table 7.4 we present the estimation results and error measures over the training and the test sets.

R output
livestock2 <- window(livestock,start=1970,end=2000)
fit1 <- ses(livestock2)
fit2 <- holt(livestock2)
fit3 <- holt(livestock2,exponential=TRUE)
fit4 <- holt(livestock2,damped=TRUE)
fit5 <- holt(livestock2,exponential=TRUE,damped=TRUE)
# Results for first model:
fit1$model
accuracy(fit1) # training set
accuracy(fit1,livestock) # test set
SES Holt's Exponential Additive Multiplicative
$\alpha$ 1 0.98 0.98 0.99 0.98
$\beta^*$ 0 0 0 0.00
$\phi$ 0.98† 0.98
$\ell_0$ 263.92 257.78 255.52 254.58 254.69
$b_0$ 5.01 1.01 5.39 1.02
RMSE 14.77 13.92 14.06 14.00 14.03
SSE 6761.47 6006.06 6128.46 6080.26 6100.11
MAE 20.38 10.69 9.64 14.18 11.77
RMSE 25.46 11.88 12.50 15.78 12.62
MAPE 4.60 2.54 2.33 3.26 2.76
MASE 2.26 1.19 1.07 1.57 1.31

† the parameter is restricted to $\phi \le 0.98$. See text for more details.

For the simple exponential smoothing method, the estimated smoothing parameter is $\alpha=1$. This is expected as the series is clearly trending over time and simple exponential smoothing requires the largest possible adjustment in each step to capture this trend.

For the other methods, there is also a trend component. With the exception of the multiplicative damped trend method, the smoothing parameter for the slope parameter is estimated to be zero, indicating that the trend is not changing over time. Of course, the trend estimated using the damped trend methods will change in the future due to the damping.

In Figure 7.4, we plot the level and trend components for Holt’s method and for the additive damped trend method. The slope of the trend component for Holt's method is constant, showing that the trend is linear. In contrast, the slope of the trend component for the damped trend method is decreasing, showing that the trend is levelling off.

Figure 7.4: Level and slope components for Holt’s linear trend method and the additive damped trend method.

R output
plot(fit2\$model\$state)
plot(fit4\$model\$state)

Figure 7.5: Forecasting livestock, sheep in Asia: comparing forecasting performance of non-seasonal methods.

R output
plot(fit3, type="o", ylab="Livestock, sheep in Asia (millions)",
  flwd=1, plot.conf=FALSE)
lines(window(livestock,start=2001),type="o")
lines(fit1$mean,col=2)
lines(fit2$mean,col=3)
lines(fit4$mean,col=5)
lines(fit5$mean,col=6)
legend("topleft", lty=1, pch=1, col=1:6,
    c("Data","SES","Holt's","Exponential",
      "Additive Damped","Multiplicative Damped"))

For the additive damped trend method, the damping parameter $\phi$ is restricted to a maximum of 0.98 (the estimation returned an optimal value of $\phi=1$). This restriction is imposed to ensure that the additive damped trend method generates noticeably different forecasts from Holt’s linear method, otherwise we get identical forecasts.

The SSE measures calculated over the training set show that Holt’s linear trend method provides the best fit to the data followed by the additive damped trend method. Simple exponential smoothing generates the largest within-sample one-step errors. In Figure 7.5 we can examine the forecasts generated by the methods. Pretending that we have not seen the data over the test-set we would conclude that all forecasts are quite plausible especially from the methods that account for the trend in the data.

Comparing the forecasting performance of the methods over the test set in Table 7.4, the exponential trend model is the most accurate method according to the MAE, MAPE and MASE, while Holt's linear method is most accurate according to the RMSE. Conflicting results like this are very common when performing forecasting competitions between methods. As forecasting tasks can vary by many dimensions (length of forecast horizon, size of test set, forecast error measures, frequency of data, etc.), it is unlikely that one method will be better than all others for all forecasting scenarios. What we require from a forecasting method are consistently sensible forecasts, and these should be frequently evaluated against the task at hand.