# 7.1 Simple exponential smoothing

The simplest of the exponentially smoothing methods is naturally called “simple exponential smoothing” (SES). (In some books, it is called “single exponential smoothing”.) This method is suitable for forecasting data with no trend or seasonal pattern. For example, the data in Figure 7.1 do not display any clear trending behaviour or any seasonality, although the mean of the data may be changing slowly over time. We have already considered the naïve and the average as possible methods for forecasting such data (Section 2/3).

plot(oildata, ylab="Oil (millions of tonnes)",xlab="Year")

Using the naïve method, all forecasts for the future are equal to the last observed value of the series,

for $h=1,2,\dots$. Hence, the naïve method assumes that the most current observation is the only important one and all previous observations provide no information for the future. This can be thought of as a weighted average where all the weight is given to the last observation.

Using the average method, all future forecasts are equal to a simple average of the observed data,

Hence, the average method assumes that all observations are of equal importance and they are given equal weight when generating forecasts.

We often want something between these two extremes. For example it may be sensible to attach larger weights to more recent observations than to observations from the distant past. This is exactly the concept behind simple exponential smoothing. Forecasts are calculated using weighted averages where the weights decrease exponentially as observations come from further in the past --- the smallest weights are associated with the oldest observations:

where $0 \le \alpha \le 1$ is the smoothing parameter. The one-step-ahead forecast for time $T+1$ is a weighted average of all the observations in the series $y_1,\dots,y_T$. The rate at which the weights decrease is controlled by the parameter $\alpha$.

Table 7.1 shows the weights attached to observations for four different values of $\alpha$ when forecasting using simple exponential smoothing. Note that the sum of the weights even for a small $\alpha$ will be approximately one for any reasonable sample size.

Observation | $\alpha=0.2$ | $\alpha=0.4$ | $\alpha=0.6$ | $\alpha=0.8$ |
---|---|---|---|---|

$y_{T}$ | $0.2$ | $0.4$ | $0.6$ | $0.8$ |

$y_{T-1}$ | $0.16$ | $0.24$ | $0.24$ | $0.16$ |

$y_{T-2}$ | $0.128$ | $0.144$ | $0.096$ | $0.032$ |

$y_{T-3}$ | $0.1024$ | $0.0864$ | $0.0384$ | $0.0064$ |

$y_{T-4}$ | $(0.2)(0.8)^4$ | $(0.4)(0.6)^4$ | $(0.6)(0.4)^4$ | $(0.8)(0.2)^4$ |

$y_{T-5}$ | $(0.2)(0.8)^5$ | $(0.4)(0.6)^5$ | $(0.6)(0.4)^5$ | $(0.8)(0.2)^5$ |

For any $\alpha$ between 0 and 1, the weights attached to the observations decrease exponentially as we go back in time, hence the name “exponential smoothing”. If $\alpha$ is small (i.e., close to 0), more weight is given to observations from the more distant past. If $\alpha$ is large (i.e., close to 1), more weight is given to the more recent observations. At the extreme case where $\alpha=1$, $\hat{y}_{T+1|T}=y_T$ and forecasts are equal to the naïve forecasts.

We present three equivalent forms of simple exponential smoothing, each of which leads to the forecast equation (\ref{eq-7-ses}).

## Weighted average form

## Component form

## Error correction form

## Multi-horizon Forecasts

So far we have given forecast equations for only one step ahead. Simple exponential smoothing has a “flat” forecast function, and therefore for longer forecast horizons, $$\pred{y}{T+h}{T}=\pred{y}{T+1}{T}=\ell_T, \qquad h=2,3,\dots. $$ Remember these forecasts will only be suitable if the time series has no trend or seasonal component.

## Initialisation

The application of every exponential smoothing method requires the initialisation of the smoothing process. For simple exponential smoothing we need to specify an initial value for the level, $\ell_0$, which appears in the last term of equation (\ref{eq-7-waforecasts}). Hence $\ell_0$ plays a role in all forecasts generated by the process. In general, the weight attached to $\ell_0$ is small. However, in the case that $\alpha$ is small and/or the time series is relatively short, the weight may be large enough to have a noticeable effect on the resulting forecasts. Therefore, selecting suitable initial values can be quite important. A common approach is to set $\ell_0=y_1$ (recall that $\ell_0=\pred{y}{1}{0}$).

Other exponential smoothing methods that also involve a trend and/or a seasonal component require initial values for these components also. We tabulate common strategies for selecting initial values in Table 7.9.

An alternative approach (see below) is to use optimization to estimate the value of $\ell_0$ rather than set it to some value. Even if optimization is used, selecting appropriate initial values can assist the speed and precision of the optimization process.

## Optimization

For every exponential smoothing method we also need to choose the value for the smoothing parameters. For simple exponential smoothing, there is only one smoothing parameter ($\alpha$), but for the methods that follow there is usually more than one smoothing parameter.

There are cases where the smoothing parameters may be chosen in a subjective manner — the forecaster specifies the value of the smoothing parameters based on previous experience. However, a more robust and objective way to obtain values for the unknown parameters included in any exponential smoothing method is to estimate them from the observed data.

In Section 4/2 we estimated the coefficients of a regression model by minimizing the sum of the squared errors (SSE). Similarly, the unknown parameters and the initial values for any exponential smoothing method can be estimated by minimizing the SSE. The errors are specified as $e_t=y_t - \pred{y}{t}{t-1}$ for $t=1,\dots,T$ (the one-step-ahead within-sample forecast errors). Hence we find the values of the unknown parameters and the initial values that minimize \begin{equation}\tag{7.3}\label{eq-7-SSE} \text{SSE}=\sum_{t=1}^T(y_t - \pred{y}{t}{t-1})^2=\sum_{t=1}^Te_t^2. \end{equation} Unlike the regression case (where we have formulae that return the values of the regression coefficients which minimize the SSE) this involves a non-linear minimization problem and we need to use an optimization tool to perform this.

## Example 7.1 Oil production

fit2 <- ses(oildata, alpha=0.6, initial="simple", h=3)

fit3 <- ses(oildata, h=3)

plot(fit1, plot.conf=FALSE, ylab="Oil (millions of tonnes)",

xlab="Year", main="", fcol="white", type="o")

lines(fitted(fit1), col="blue", type="o")

lines(fitted(fit2), col="red", type="o")

lines(fitted(fit3), col="green", type="o")

lines(fit1$mean, col="blue", type="o")

lines(fit2$mean, col="red", type="o")

lines(fit3$mean, col="green", type="o")

legend("topleft",lty=1, col=c(1,"blue","red","green"),

c("data", expression(alpha == 0.2), expression(alpha == 0.6),

expression(alpha == 0.89)),pch=1)

In this example, simple exponential smoothing is applied to forecast oil production in Saudi Arabia. The black line in Figure 7.2 is a plot of the data over the period 1996–2007, which shows a changing level over time but no obvious trending behaviour.

In Table 7.2 we demonstrate the application of simple exponential smoothing. The last three columns show the estimated level for times $t=0$ to $t=12$, then the forecasts for $h=1,2,3,$ for three different values of $\alpha$. For the first two columns the smoothing parameter $\alpha$ is set to $0.2$ and $0.6$ respectively and the initial level $\ell_0$ is set to $y_1$ in both cases. In the third column both the smoothing parameter and the initial level are estimated. Using an optimization tool, we find the values of $\alpha$ and $\ell_0$ that minimize the SSE, subject to the restriction that $0\le\alpha\le1$. Note that the SSE values presented in the last row of the table is smaller for this estimated $\alpha$ and $\ell_0$ than for the other values of $\alpha$ and $\ell_0$.

Year | Time Period $t$ | Observed values $y_t$ | Level $\ell_t$ $\alpha=0.2$ | Level $\ell_t$ $\alpha=0.6$ | Level $\ell_t$ $\alpha=0.89^*$ |
---|---|---|---|---|---|

-- | 0 | -- | 446.7 | 446.7 | 447.5* |

1996 | 1 | 446.7 | 446.7 | 446.7 | 446.7 |

1997 | 2 | 454.5 | 448.2 | 451.3 | 453.6 |

1998 | 3 | 455.7 | 449.7 | 453.9 | 455.4 |

1999 | 4 | 423.6 | 444.5 | 435.8 | 427.1 |

2000 | 5 | 456.3 | 446.8 | 448.1 | 453.1 |

2001 | 6 | 440.6 | 445.6 | 443.6 | 441.9 |

2002 | 7 | 425.3 | 441.5 | 432.6 | 427.1 |

2003 | 8 | 485.1 | 450.3 | 464.1 | 478.9 |

2004 | 9 | 506.0 | 461.4 | 489.3 | 503.1 |

2005 | 10 | 526.8 | 474.5 | 511.8 | 524.2 |

2006 | 11 | 514.3 | 482.5 | 513.3 | 515.3 |

2007 | 12 | 494.2 | 484.8 | 501.8 | 496.5 |

$h$ | Forecasts $\hat{y}_{T+h|T}$ |
||||

2008 | 1 | -- | 484.8 | 501.8 | 496.5 |

2009 | 2 | -- | 484.8 | 501.8 | 496.5 |

2010 | 3 | -- | 484.8 | 501.8 | 496.5 |

MAE | 24.7 | 20.2 | 20.1 | ||

RMSE | 32.1 | 26.0 | 25.1 | ||

MAPE | 5.1 | 4.2 | 4.3 | ||

SSE | 12391.7 | 8098.6 | 7573.4 |

*Table 7.2: Forecasting total oil production in millions of tonnes for Saudi Arabia using simple exponential smoothing with three different values for the smoothing parameter $\alpha$.*

**$\alpha=0.89$ and $\ell_0=447.5$ are obtained by minimizing SSE over periods $t = 1,2,...,12$.*

The three different sets of forecasts for the period 2008–2010 are plotted in Figure 7.2. Also plotted are one-step-ahead within-sample forecasts alongside the data over the period 1996–2007. The influence of $\alpha$ on the smoothing process is clearly visible. The larger the $\alpha$ the greater the adjustment that takes place in the next forecast in the direction of the previous data point; smaller $\alpha$ leads to less adjustment and so the series of one-step within-sample forecasts is smoother.