4.5 Forecasting with regression

Forecasts from a simple linear model are easily obtained using the equation $$\hat{y}=\hat{\beta}_0+\hat{\beta}_1 x$$ where $x$ is the value of the predictor for which we require a forecast. That is, if we input a value of $x$ in the equation we obtain a corresponding forecast $\hat{y}$.

When this calculation is done using an observed value of $x$ from the data, we call the resulting value of $\hat{y}$ a “fitted value”. This is not a genuine forecast as the actual value of $y$ for that predictor value was used in estimating the model, and so the value of $\hat{y}$ is affected by the true value of $y$. When the values of $x$ is a new value (i.e., not part of the data that were used to estimate the model), the resulting value of $\hat{y}$ is a genuine forecast.

Assuming that the regression errors are normally distributed, an approximate 95% forecast interval (also called a prediction interval) associated with this forecast is given by $$\label{eq-4-pi}\tag{4.2} \hat{y} \pm 1.96 s_e\sqrt{1+\frac{1}{N}+\frac{(x-\bar{x})^2}{(N-1)s_x^2}},$$ where $N$ is the total number of observations, $\bar{x}$ is the mean of the observed $x$ values, $s_x$ is the standard deviation of the observed $x$ values and $s_e$ is given by equation (4.1). Similarly, an 80% forecast interval can be obtained by replacing 1.96 by 1.28 in equation (\ref{eq-4-pi}). Other appropriate forecasting intervals can be obtained by replacing the 1.96 with the appropriate value given in Table 2.1. If R is used to obtain forecast intervals (as in the example below), more exact calculations are obtained (especially for small values of $N$) than what is given by equation (\ref{eq-4-pi}).

Equation (\ref{eq-4-pi}) shows that the forecast interval is wider when $x$ is far from $\bar{x}$. That is, we are more certain about our forecasts when considering values of the predictor variable close to its sample mean.

The estimated regression line in the Car data example is $$\hat{y}=12.53-0.22x.$$ For the Chevrolet Aveo (the first car in the list) $x_1$=25 mpg and $y_1=6.6$ tons of CO$_2$ per year. The model returns a fitted value of $\hat{y}_1$=7.00, i.e., $e_1=-0.4$. For a car with City driving fuel economy $x=30$ mpg, the average footprint forecasted is $\hat{y}=5.90$ tons of CO$_2$ per year. The corresponding 95% and 80% forecast intervals are $\lbrack 4.95, 6.84 \rbrack$ and $\lbrack 5.28, 6.51 \rbrack$ respectively (calculated using R).

Figure 4.6: Forecast with 80% and 95% forecast intervals for a car with $x=30$ mpg in city driving.

R code
fitted(fit)[1]
fcast <- forecast(fit, newdata=data.frame(City=30))
plot(fcast, xlab="City (mpg)", ylab="Carbon footprint (tons per year)")
# The displayed graph uses jittering, while the code above does not.