# 11.4 The loglinear model

We have just seen how the general linear model can be extended to deal with a binary response variable. Our approach was to model the proportion of observations that belonged to one class or the other using logistic regression. Another sort of response variable that biologists frequently deal with is counts (integers). To illustrate, consider the following data on bromeliads that were collected from a tropical forest. These data were first introduced back in Section ?? . Within each of these bromeliads, a community of aquatic invertebrate species exists in a natural microcosm. All of the individuals from these communities were enumerated and identified. The hypothesis is that, as the amount of detritus that is found within these communities increases, the number of detritivore species should also increase. Thus, we want to model detritivore richness as a function of detritus. The data are in a file called “bromeliad.csv”

> str(d)

'data.frame': 190 obs. of 5 variables:

$ plant : int 1 2 3 4 5 6 7 8 9 10 ...

$ forest : int 1 1 1 1 1 1 1 1 1 1 ...

$ year : int 1 1 1 1 1 1 1 1 1 1 ...

$ debris.wt: num 17 17.4 52.1 44.5 43.4 ...

$ S.d : int 9 7 11 9 11 7 5 6 7 19 ...

We see that the data frame has five variables. `S.d`

is detritivore
richness (i.e., the number of detritivore species), `debris.wt`

is the
amount of detritus. To start, let’s fit a linear model using a log
transform for the biomass of detritus
(Fig.1.6).

> summary(m)

Call:

lm(formula = S.d ~ log(debris.wt), data = d)

Residuals:

Min 1Q Median 3Q Max

-7.8392 -2.0483 -0.2904 1.7479 8.1964

Coefficients:

Estimate Std. Error t value Pr( > |t|)

(Intercept) 3.2258 0.5136 6.281 2.28e-09 ***

log(debris.wt) 2.3236 0.1793 12.958 < 2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.015 on 188 degrees of freedom

Multiple R-squared: 0.4718, Adjusted R-squared: 0.469

F-statistic: 167.9 on 1 and 188 DF, p-value: < 2.2e-16

> plot(d$S.d ~ log(d$debris.wt), pch = 16, xlab = "log(detritus biomass)",

+ ylab = "Detritivore Species Richness", xlim = c(-3,

+ 5), ylim = c(-5, 23))

> lines(c(-4, 6), c(0, 0), lty = 2)

> abline(m)

Clearly, there is an increase in species richness as the biomass of detritus increases. However, there are some potential problems with our model. First, we know that richness, an integer, is not normally distributed. In addition, the model predicts negative values of detritivore richness. This is biologically impossible. Furthermore, inspection of diagnostic plots reveals another problem with integer response variables.

> plot(m)

In the top left panel of Fig.1.7, we see
that the cloud of residuals is shaped a bit like a ’ < ’. As the fitted
values increase (x-axis), the spread in the residuals increases (width
of the cloud in the direction of the y-axis). This indicates that the
variance of `S.d`

is not constant (i.e., heteroscedastic). In fact, this
pattern frequently is encountered when dealing with count data. One
approach to dealing with this is to log-transform the dependent variable
- doing so can help resolve heteroscedasticity. However, in this case,
the distribution of errors will still be non-normal.

The loglinear model allows us to deal with such data. It is a GLM with a Poisson error distribution and a log link function. Thus, we assume

and the loglinear model is:

Let’s fit this model to the bromeliad data.

+ d)

> summary(m2)

Call:

glm(formula = S.d ~ log(debris.wt), family = poisson, data = d)

Deviance Residuals:

Min 1Q Median 3Q Max

-2.76047 -0.72079 -0.07123 0.56187 2.33552

Coefficients:

Estimate Std. Error z value Pr( > |z|)

(Intercept) 1.45353 0.06859 21.19 < 2e-16 ***

log(debris.wt) 0.27679 0.02169 12.76 < 2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 350.63 on 189 degrees of freedom

Residual deviance: 175.06 on 188 degrees of freedom

AIC: 933.98

Number of Fisher Scoring iterations: 4

Interpreting the model summary is much the same as it was in logistic regression. The number of iteratively reweighted least squares iterations is reported as the “Number of Fisher Scoring iterations”. Wald tests are reported for each of the individual parameter estimates, and the output indicates that the loglinear model for these data is

$ln(\hat y)=$1.45$+$0.28$ \times ln(debris.wt)$.

As in logistic regression, the likelihood ratio $\chi ^{2}$ test available in the ANODEV table (see below) is a little more robust than the Wald test for the slope, especially when sample size is small. In this case, with n = 190, this is not a major concern. The dispersion parameter is again assumed to be 1. This means that we are assuming that the theoretical expectation for the variance holds. For a Poisson distribution, this expectation is is that the variance is equal to $\lambda $ (which is also the mean). Finally, information on Null and Residual deviance also is reported. Using formula 1.2, 50.1% of the variability in the number of detritivore species is accounted for by the model.

Analysis of Deviance Table

Model: poisson, link: log

Response: S.d

Terms added sequentially (first to last)

Df Deviance Resid. Df Resid. Dev P( > |Chi|)

NULL 189 350.63

log(debris.wt) 1 175.57 188 175.06 4.501e-40

\begin{verbatim}

> plot(d$S.d ~ log(d$debris.wt), pch = 16, xlab = "log(detritus biomass)",

+ ylab = "Detritivore Species Richness")

> abline(m, lwd = 2)

> lines(c(-4, 6), c(0, 0), lty = 2)

> newdata = data.frame(debris.wt = seq(0.1, 150,

+ by = 0.1))

> lines(log(newdata$debris.wt), predict(m2, newdata,

+ type = "response"), col = "blue", lwd = 2)

> legend(-1.5, 20, c("Linear", "Loglinear"), col = c("black",

+ "blue"), lty = c(1, 1), lwd = c(2, 2))

In Fig.1.8, we see that the loglinear model does not give rise to negative richness predictions. In addition, because of the log link function, the model is not straight.