# 4.3 Regression and correlation

The correlation coefficient $r$ was introduced in Section 2/2. Recall that $r$ measures the strength and the direction (positive or negative) of the linear relationship between the two variables. The stronger the linear relationship, the closer the observed data points will cluster around a straight line.

So correlation and regression are strongly linked. The advantage of a regression model over correlation is that it asserts a predictive relationship between the two variables ($x$ predicts $y$) and quantifies this in a way that is useful for forecasting.

## Example 4.1 Car emissions

Data on the carbon footprint and fuel economy for 2009 model cars were first introduced in Chapter 1. A scatter plot of Carbon (carbon footprint in tonnes per year) versus City (fuel economy in city driving conditions in miles per gallon) for all 134 cars is presented in Figure 4.3. Also plotted is the estimated regression line $$\hat{y}=12.53-0.22x. $$

ylab="Carbon footprint (tons per year)",data=fuel)

fit <- lm(Carbon ~ City, data=fuel)

abline(fit)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 12.525647 0.199232 62.87 <2e-16 ***

City -0.220970 0.008878 -24.89 <2e-16 ***

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 0.4703 on 132 degrees of freedom

Multiple R-squared: 0.8244, Adjusted R-squared: 0.823

F-statistic: 619.5 on 1 and 132 DF, p-value: < 2.2e-16

The regression estimation output from R is also shown. Notice the coefficient estimates in the column labelled “Estimate”. The other features of the output will be explained later in this chapter.

**Interpreting the intercept**, $\hat{\beta}_0=12.53$. A car that
has fuel economy of $0$ mpg in city driving conditions can expect an
average carbon footprint of $12.53$ tonnes per year. As often happens
with the intercept, this is a case where the interpretation is nonsense
as it is impossible for a car to have fuel economy of $0$ mpg.

The interpretation of the intercept requires that a value of $x=0$ makes sense. When $x=0$ makes sense, the intercept $\hat{\beta}_0$ is the predicted value of $y$ corresponding to $x=0$. Even when $x=0$ does not make sense, the intercept is an important part of the model. Without it, the slope coefficient can be distorted unnecessarily.

**Interpreting the slope**, $\hat{\beta}_1=-0.22$. For every extra
mile per gallon, a car’s carbon footprint will decrease on average by
0.22 tonnes per year. Alternatively, if we consider two cars whose fuel
economies differ by 1 mpg in city driving conditions, their carbon
footprints will differ, on average, by 0.22 tonnes per year (with the
car travelling further per gallon of fuel having the smaller carbon
footprint).