7.1 Simple exponential smoothing

The simplest of the exponentially smoothing methods is naturally called “simple exponential smoothing” (SES). (In some books, it is called “single exponential smoothing”.) This method is suitable for forecasting data with no trend or seasonal pattern. For example, the data in Figure 7.1 do not display any clear trending behaviour or any seasonality, although the mean of the data may be changing slowly over time. We have already considered the naïve and the average as possible methods for forecasting such data (Section 2/3).

Figure 7.1: Oil production in Saudi Arabia from 1996 to 2007.

R output
oildata <- window(oil,start=1996,end=2007)
plot(oildata, ylab="Oil (millions of tonnes)",xlab="Year")

Using the naïve method, all forecasts for the future are equal to the last observed value of the series,

$$\hat{y}_{T+h|T} = y_T, $$

for $h=1,2,\dots$. Hence, the naïve method assumes that the most current observation is the only important one and all previous observations provide no information for the future. This can be thought of as a weighted average where all the weight is given to the last observation.

Using the average method, all future forecasts are equal to a simple average of the observed data,

$$\hat{y}_{T+h|T} = \frac1T \sum_{t=1}^T y_t, $$ for $h=1,2,\dots$.

Hence, the average method assumes that all observations are of equal importance and they are given equal weight when generating forecasts.

We often want something between these two extremes. For example it may be sensible to attach larger weights to more recent observations than to observations from the distant past. This is exactly the concept behind simple exponential smoothing. Forecasts are calculated using weighted averages where the weights decrease exponentially as observations come from further in the past --- the smallest weights are associated with the oldest observations:

\begin{equation}\tag{7.1}\label{eq-7-ses} \hat{y}_{T+1|T} = \alpha y_T + \alpha(1-\alpha) y_{T-1} + \alpha(1-\alpha)^2 y_{T-2}+ \cdots, \end{equation}

where $0 \le \alpha \le 1$ is the smoothing parameter. The one-step-ahead forecast for time $T+1$ is a weighted average of all the observations in the series $y_1,\dots,y_T$. The rate at which the weights decrease is controlled by the parameter $\alpha$.

Table 7.1 shows the weights attached to observations for four different values of $\alpha$ when forecasting using simple exponential smoothing. Note that the sum of the weights even for a small $\alpha$ will be approximately one for any reasonable sample size.

Observation $\alpha=0.2$ $\alpha=0.4$ $\alpha=0.6$ $\alpha=0.8$
$y_{T}$ $0.2$ $0.4$ $0.6$ $0.8$
$y_{T-1}$ $0.16$ $0.24$ $0.24$ $0.16$
$y_{T-2}$ $0.128$ $0.144$ $0.096$ $0.032$
$y_{T-3}$ $0.1024$ $0.0864$ $0.0384$ $0.0064$
$y_{T-4}$ $(0.2)(0.8)^4$ $(0.4)(0.6)^4$ $(0.6)(0.4)^4$ $(0.8)(0.2)^4$
$y_{T-5}$ $(0.2)(0.8)^5$ $(0.4)(0.6)^5$ $(0.6)(0.4)^5$ $(0.8)(0.2)^5$

For any $\alpha$ between 0 and 1, the weights attached to the observations decrease exponentially as we go back in time, hence the name “exponential smoothing”. If $\alpha$ is small (i.e., close to 0), more weight is given to observations from the more distant past. If $\alpha$ is large (i.e., close to 1), more weight is given to the more recent observations. At the extreme case where $\alpha=1$, $\hat{y}_{T+1|T}=y_T$ and forecasts are equal to the naïve forecasts.

We present three equivalent forms of simple exponential smoothing, each of which leads to the forecast equation (\ref{eq-7-ses}).

Weighted average form

The forecast at time $t+1$ is equal to a weighted average between the most recent observation $y_t$ and the most recent forecast $\hat{y}_{t|t-1}$, $$\hat{y}_{t+1|t} = \alpha y_t + (1-\alpha) \hat{y}_{t|t-1} $$ for $t=1,\dots,T$, where $0 \le \alpha \le 1$ is the smoothing parameter. The process has to start somewhere, so we let the first forecast of $y_1$ be denoted by $\ell_0$. Then \begin{align*} \hat{y}_{2|1} &= \alpha y_1 + (1-\alpha) \ell_0\\ \hat{y}_{3|2} &= \alpha y_2 + (1-\alpha) \hat{y}_{2|1}\\ \hat{y}_{4|3} &= \alpha y_3 + (1-\alpha) \hat{y}_{3|2}\\ \vdots\\ \hat{y}_{T+1|T} &= \alpha y_T + (1-\alpha) \hat{y}_{T|T-1} \end{align*} Then substituting each equation into the following equation, we obtain \begin{align*} \hat{y}_{3|2} &= \alpha y_2 + (1-\alpha) \left[\alpha y_1 + (1-\alpha) \ell_0\right]\\ \nonumber &= \alpha y_2 + \alpha(1-\alpha) y_1 + (1-\alpha)^2 \ell_0\\ \hat{y}_{4|3} &= \alpha y_3 + (1-\alpha) [\alpha y_2 + \alpha(1-\alpha) y_1 + (1-\alpha)^2 \ell_0]\\ &= \alpha y_3 + \alpha(1-\alpha) y_2 + \alpha(1-\alpha)^2 y_1 + (1-\alpha)^3 \ell_0\\ &~~\vdots \end{align*} \begin{align} \hat{y}_{T+1|T} &= \sum_{j=0}^{T-1} \alpha(1-\alpha)^j y_{T-j} + (1-\alpha)^T \ell_{0}. \tag{7.2}\label{eq-7-waforecasts} \end{align} So the weighted average form leads to the same forecast equation (\ref{eq-7-ses}).

Component form

An alternative representation is the component form. For simple exponential smoothing the only component included is the level, $\ell_t$. (Other methods considered later in this chapter may also include a trend $b_t$ and seasonal component $s_t$.) Component form representations of exponential smoothing methods comprise a forecast equation and a smoothing equation for each of the components included in the method. The component form of simple exponential smoothing is given by: \begin{align*} \text{Forecast equation}&&\pred{y}{t+1}{t} &= \ell_{t}\\ \text{Smoothing equation}&&\ell_{t} &= \alpha y_{t} + (1 - \alpha)\ell_{t-1}, \end{align*} where $\ell_{t}$ is the level (or the smoothed value) of the series at time $t$. The forecast equation shows that the forecasted value at time $t+1$ is the estimated level at time $t$. The smoothing equation for the level (usually referred to as the level equation) gives the estimated level of the series at each period $t$. Applying the forecast equation for time $T$ gives, $\pred{y}{T+1}{T} = \ell_{T}$, the most recent estimated level. If we replace $\ell_t$ by $\pred{y}{t+1}{t}$ and $\ell_{t-1}$ by $\pred{y}{t}{t-1}$ in the smoothing equation, we will recover the weighted average form of simple exponential smoothing.

Error correction form

The third form of simple exponential smoothing is obtained by re-arranging the level equation in the component form to get what we refer to as the error correction form \begin{align*} \ell_{t} &= \ell_{t-1}+\alpha( y_{t}-\ell_{t-1})\\ &= \ell_{t-1}+\alpha e_{t} \end{align*} where $e_{t}=y_{t}-\ell_{t-1}=y_{t}-\pred{y}{t}{t-1}$ for $t=1,\dots,T$. That is, $e_{t}$ is the one-step within-sample forecast error at time $t$. The within-sample forecast errors lead to the adjustment/correction of the estimated level throughout the smoothing process for $t=1,\dots,T$. For example, if the error at time $t$ is negative, then $\pred{y}{t}{t-1}>y_t$ and so the level at time $t-1$ has been over-estimated. The new level $\ell_t$ is then the previous level $\ell_{t-1}$ adjusted downwards. The closer $\alpha$ is to one the “rougher” the estimate of the level (large adjustments take place). The smaller the $\alpha$ the “smoother” the level (small adjustments take place).

Multi-horizon Forecasts

So far we have given forecast equations for only one step ahead. Simple exponential smoothing has a “flat” forecast function, and therefore for longer forecast horizons, $$\pred{y}{T+h}{T}=\pred{y}{T+1}{T}=\ell_T, \qquad h=2,3,\dots. $$ Remember these forecasts will only be suitable if the time series has no trend or seasonal component.

Initialisation

The application of every exponential smoothing method requires the initialisation of the smoothing process. For simple exponential smoothing we need to specify an initial value for the level, $\ell_0$, which appears in the last term of equation (\ref{eq-7-waforecasts}). Hence $\ell_0$ plays a role in all forecasts generated by the process. In general, the weight attached to $\ell_0$ is small. However, in the case that $\alpha$ is small and/or the time series is relatively short, the weight may be large enough to have a noticeable effect on the resulting forecasts. Therefore, selecting suitable initial values can be quite important. A common approach is to set $\ell_0=y_1$ (recall that $\ell_0=\pred{y}{1}{0}$).

Other exponential smoothing methods that also involve a trend and/or a seasonal component require initial values for these components also. We tabulate common strategies for selecting initial values in Table 7.9.

An alternative approach (see below) is to use optimization to estimate the value of $\ell_0$ rather than set it to some value. Even if optimization is used, selecting appropriate initial values can assist the speed and precision of the optimization process.

Optimization

For every exponential smoothing method we also need to choose the value for the smoothing parameters. For simple exponential smoothing, there is only one smoothing parameter ($\alpha$), but for the methods that follow there is usually more than one smoothing parameter.

There are cases where the smoothing parameters may be chosen in a subjective manner — the forecaster specifies the value of the smoothing parameters based on previous experience. However, a more robust and objective way to obtain values for the unknown parameters included in any exponential smoothing method is to estimate them from the observed data.

In Section 4/2 we estimated the coefficients of a regression model by minimizing the sum of the squared errors (SSE). Similarly, the unknown parameters and the initial values for any exponential smoothing method can be estimated by minimizing the SSE. The errors are specified as $e_t=y_t - \pred{y}{t}{t-1}$ for $t=1,\dots,T$ (the one-step-ahead within-sample forecast errors). Hence we find the values of the unknown parameters and the initial values that minimize \begin{equation}\tag{7.3}\label{eq-7-SSE} \text{SSE}=\sum_{t=1}^T(y_t - \pred{y}{t}{t-1})^2=\sum_{t=1}^Te_t^2. \end{equation} Unlike the regression case (where we have formulae that return the values of the regression coefficients which minimize the SSE) this involves a non-linear minimization problem and we need to use an optimization tool to perform this.

Example 7.1 Oil production

Figure 7.2: Simple exponential smoothing applied to oil production in Saudi Arabia (1996–2007).

R output
fit1 <- ses(oildata, alpha=0.2, initial="simple", h=3)
fit2 <- ses(oildata, alpha=0.6, initial="simple", h=3)
fit3 <- ses(oildata, h=3)
plot(fit1, plot.conf=FALSE, ylab="Oil (millions of tonnes)",
  xlab="Year", main="", fcol="white", type="o")
lines(fitted(fit1), col="blue", type="o")
lines(fitted(fit2), col="red", type="o")
lines(fitted(fit3), col="green", type="o")
lines(fit1$mean, col="blue", type="o")
lines(fit2$mean, col="red", type="o")
lines(fit3$mean, col="green", type="o")
legend("topleft",lty=1, col=c(1,"blue","red","green"),
  c("data", expression(alpha == 0.2), expression(alpha == 0.6),
  expression(alpha == 0.89)),pch=1)

In this example, simple exponential smoothing is applied to forecast oil production in Saudi Arabia. The black line in Figure 7.2 is a plot of the data over the period 1996–2007, which shows a changing level over time but no obvious trending behaviour.

In Table 7.2 we demonstrate the application of simple exponential smoothing. The last three columns show the estimated level for times $t=0$ to $t=12$, then the forecasts for $h=1,2,3,$ for three different values of $\alpha$. For the first two columns the smoothing parameter $\alpha$ is set to $0.2$ and $0.6$ respectively and the initial level $\ell_0$ is set to $y_1$ in both cases. In the third column both the smoothing parameter and the initial level are estimated. Using an optimization tool, we find the values of $\alpha$ and $\ell_0$ that minimize the SSE, subject to the restriction that $0\le\alpha\le1$. Note that the SSE values presented in the last row of the table is smaller for this estimated $\alpha$ and $\ell_0$ than for the other values of $\alpha$ and $\ell_0$.

Year Time Period $t$ Observed values $y_t$ Level $\ell_t$ $\alpha=0.2$ Level $\ell_t$ $\alpha=0.6$ Level $\ell_t$ $\alpha=0.89^*$
-- 0 -- 446.7 446.7 447.5*
1996 1 446.7 446.7 446.7 446.7
1997 2 454.5 448.2 451.3 453.6
1998 3 455.7 449.7 453.9 455.4
1999 4 423.6 444.5 435.8 427.1
2000 5 456.3 446.8 448.1 453.1
2001 6 440.6 445.6 443.6 441.9
2002 7 425.3 441.5 432.6 427.1
2003 8 485.1 450.3 464.1 478.9
2004 9 506.0 461.4 489.3 503.1
2005 10 526.8 474.5 511.8 524.2
2006 11 514.3 482.5 513.3 515.3
2007 12 494.2 484.8 501.8 496.5
$h$ Forecasts $\hat{y}_{T+h|T}$
2008 1 -- 484.8 501.8 496.5
2009 2 -- 484.8 501.8 496.5
2010 3 -- 484.8 501.8 496.5
 
MAE 24.7 20.2 20.1
RMSE 32.1 26.0 25.1
MAPE 5.1 4.2 4.3
SSE 12391.7 8098.6 7573.4

Table 7.2: Forecasting total oil production in millions of tonnes for Saudi Arabia using simple exponential smoothing with three different values for the smoothing parameter $\alpha$.

*$\alpha=0.89$ and $\ell_0=447.5$ are obtained by minimizing SSE over periods $t = 1,2,...,12$.

The three different sets of forecasts for the period 2008–2010 are plotted in Figure 7.2. Also plotted are one-step-ahead within-sample forecasts alongside the data over the period 1996–2007. The influence of $\alpha$ on the smoothing process is clearly visible. The larger the $\alpha$ the greater the adjustment that takes place in the next forecast in the direction of the previous data point; smaller $\alpha$ leads to less adjustment and so the series of one-step within-sample forecasts is smoother.