# 11 Generalized linear models

We have seen in Chapter that the general linear model
(function `lm()`

) is a flexible tool for conducting a suite of different
analyses. However, biologists frequently deal with data that do not
adhere to the assumptions of the general linear model. In particular,
the assumption that the residuals are normally distributed is frequently
troublesome. Consider the following data. They represent the response of
tobacco budworms (death) that were exposed to varying concentrations of
trans-cypermethrin. They come from an example on pg. 190 of [32].

> y = c(1, 4, 9, 13, 18, 20)

where y represents the number of individuals that died at each concentration. Obviously, this number does not mean much without knowing how many individuals were actually exposed in each group. It turns out, 20 individuals were exposed in each group.

Now, let’s look at the proportion of individuals that died in each group (Fig.1.1a).

> plot(conc, p, xlab = "Concentration(ppb)", ylab = "Proportion",

+ pch = 16, cex.lab = 1.5, cex.axis = 1.5)

As is the case in most dose response studies, the concentrations that
were used in this experiment were based on a log scale. As a result, we
will log transform the concentrations. In fact, here, the concentrations
were based on a $log_{2}$ scale as opposed to the usual
$log_{10}$. We can easily implement this using `log2()`

.

> plot(logconc, p, xlab = "log(Concentration(ppb))",

+ ylab = "Proportion", pch = 16, cex.lab = 1.5,

+ cex.axis = 1.5)

**Figure 1.1**: Scatterplots of the data.

Based on an examination of Fig.1.1b, we could choose to just fit a linear regression using log(Concentration) as the independent variable (Fig.1.2).

> plot(logconc, p, xlab = "log(Concentration(ppb))",

+ ylab = "Proportion", pch = 16, cex.lab = 1.5,

+ cex.axis = 1.5)

> abline(m)

However, there are a couple of problems with this approach. First of all, we know that the proportions (our response variable) are not normally distributed. Hence, linear regression is not really appropriate. More importantly, the model makes predictions that do not make sense. For example, the model predicts proportions above 1 and below 0. This is unrealistic.