# 11 Generalized linear models

We have seen in Chapter that the general linear model (function lm()) is a flexible tool for conducting a suite of different analyses. However, biologists frequently deal with data that do not adhere to the assumptions of the general linear model. In particular, the assumption that the residuals are normally distributed is frequently troublesome. Consider the following data. They represent the response of tobacco budworms (death) that were exposed to varying concentrations of trans-cypermethrin. They come from an example on pg. 190 of [32].

>  conc = c(1, 2, 4, 8, 16, 32)
>  y = c(1, 4, 9, 13, 18, 20)

where y represents the number of individuals that died at each concentration. Obviously, this number does not mean much without knowing how many individuals were actually exposed in each group. It turns out, 20 individuals were exposed in each group.

>  n = c(20, 20, 20, 20, 20, 20)

Now, let’s look at the proportion of individuals that died in each group (Fig.1.1a).

>  p = y/n
>  plot(conc, p, xlab = "Concentration(ppb)", ylab = "Proportion",
+ pch = 16, cex.lab = 1.5, cex.axis = 1.5)

As is the case in most dose response studies, the concentrations that were used in this experiment were based on a log scale. As a result, we will log transform the concentrations. In fact, here, the concentrations were based on a $log_{2}$ scale as opposed to the usual $log_{10}$. We can easily implement this using log2().

>  logconc = log2(conc)
>  plot(logconc, p, xlab = "log(Concentration(ppb))",
+ ylab = "Proportion", pch = 16, cex.lab = 1.5,
+ cex.axis = 1.5)

(a) Linear Scale
(b) Log Scale

Figure 1.1: Scatterplots of the data.

Based on an examination of Fig.1.1b, we could choose to just fit a linear regression using log(Concentration) as the independent variable (Fig.1.2).

>  m = lm(p ~ logconc)
>  plot(logconc, p, xlab = "log(Concentration(ppb))",
+ ylab = "Proportion", pch = 16, cex.lab = 1.5,
+ cex.axis = 1.5)
>  abline(m)

Figure 1.2: Scatterplot showing the linear model fit.

However, there are a couple of problems with this approach. First of all, we know that the proportions (our response variable) are not normally distributed. Hence, linear regression is not really appropriate. More importantly, the model makes predictions that do not make sense. For example, the model predicts proportions above 1 and below 0. This is unrealistic.