4.1 The simple linear model
In this chapter, the forecast and predictor variables are assumed to be related by the simple linear model: $$y = \beta_0 + \beta_1 x + \varepsilon. $$ An example of data from such a model is shown in Figure 4.1. The parameters $\beta_0$ and $\beta_1$ determine the intercept and the slope of the line respectively. The intercept $\beta_0$ represents the predicted value of $y$ when $x=0$. The slope $\beta_1$ represents the predicted increase in $Y$ resulting from a one unit increase in $x$.
Notice that the observations do not lie on the straight line but are scattered around it. We can think of each observation $y_i$ consisting of the systematic or explained part of the model, $\beta_0+\beta_1x_i$, and the random “error”, $\varepsilon_i$. The “error” term does not imply a mistake, but a deviation from the underlying straight line model. It captures anything that may affect $y_i$ other than $x_i$. We assume that these errors:
- have mean zero; otherwise the forecasts will be systematically biased.
- are not autocorrelated; otherwise the forecasts will be inefficient as there is more information to be exploited in the data.
- are unrelated to the predictor variable; otherwise there would be more information that should be included in the systematic part of the model.
It is also useful to have the errors normally distributed with constant variance in order to produce prediction intervals and to perform statistical inference. While these additional conditions make the calculations simpler, they are not necessary for forecasting.
Another important assumption in the simple linear model is that $x$ is not a random variable. If we were performing a controlled experiment in a laboratory, we could control the values of $x$ (so they would not be random) and observe the resulting values of $y$. With observational data (including most data in business and economics) it is not possible to control the value of $x$, and hence we make this an assumption.