11.2.5 Analysis of covariance

Now, we can just fit the full model that includes both dose and sex.

     >  m3 = glm(cbind(dead, alive) ~ sex * logconc, family = binomial("logit"),
    + data = d)
     >  summary(m3)

    Call:
    glm(formula = cbind(dead, alive) ~ sex * logconc, family = binomial("logit"),
     data = d)

    Deviance Residuals:
     Min 1Q Median 3Q Max
    -1.39849 -0.32094 -0.07592 0.38220 1.10375

    Coefficients:
     Estimate Std. Error z value Pr( > |z|)
    (Intercept) -2.9935 0.5527 -5.416 6.09e-08 ***
    sexm 0.1750 0.7783 0.225 0.822
    logconc 0.9060 0.1671 5.422 5.89e-08 ***
    sexm:logconc 0.3529 0.2700 1.307 0.191
    ---
    Signif. codes: 0***0.001**0.01*0.05 ‘.’ 0.1 ‘ ’ 1

    (Dispersion parameter for binomial family taken to be 1)

     Null deviance: 124.8756 on 11 degrees of freedom
    Residual deviance: 4.9937 on 8 degrees of freedom
    AIC: 43.104

    Number of Fisher Scoring iterations: 4

     >  anova(m3, test = "Chisq")

    Analysis of Deviance Table

    Model: binomial, link: logit

    Response: cbind(dead, alive)

    Terms added sequentially (first to last)


     Df Deviance Resid. Df Resid. Dev P( > |Chi|)
    NULL 11 124.876
    sex 1 6.077 10 118.799 0.014
    logconc 1 112.042 9 6.757 3.499e-26
    sex:logconc 1 1.763 8 4.994 0.184

Because the interaction between sex and logconc is not significant, we have no evidence that the slope of the relationship between logit(p) and dose differs between males and females. Hence, we can simplify the model.

     >  m4 = glm(cbind(dead, alive) ~ logconc + sex, family = binomial("logit"),
    + data = d)
     >  summary(m4)

    Call:
    glm(formula = cbind(dead, alive) ~ logconc + sex, family = binomial("logit"),
     data = d)

    Deviance Residuals:
     Min 1Q Median 3Q Max
    -1.10540 -0.65343 -0.02225 0.48471 1.42944

    Coefficients:
     Estimate Std. Error z value Pr( > |z|)
    (Intercept) -3.4732 0.4685 -7.413 1.23e-13 ***
    logconc 1.0642 0.1311 8.119 4.70e-16 ***
    sexm 1.1007 0.3558 3.093 0.00198 **
    ---
    Signif. codes: 0***0.001**0.01*0.05 ‘.’ 0.1 ‘ ’ 1

    (Dispersion parameter for binomial family taken to be 1)

     Null deviance: 124.876 on 11 degrees of freedom
    Residual deviance: 6.757 on 9 degrees of freedom
    AIC: 42.867

    Number of Fisher Scoring iterations: 4

And we can get the odds ratios.

     >  exp(coef(m4))

    (Intercept) logconc sexm
     0.031019 2.898560 3.006400

Now we see that for a given sex, if you increase the $log_{2}$(Concentration) by one unit, the odds of death increases 2.9 times. In addition, after controlling for logconc, the odds of death for males is 3.01 times the odds of death for females. This is the same thing as saying, for a given concentration, the odds of death for males is 3.01 times the odds of death for females.

To extract the LC50 values, we need to have a slope and intercept for each sex. However, as it is currently parameterized, the model coefficients contain a common slope, but the estimate of the intercept for males is really the change in intercept as you go from females to males. We can get the actual intercept for males by re-parameterizing the model. By adding -1 to the model formula, we tell R to drop the overall intercept from the model. Now, look at the coefficients.

     >  m4 = glm(cbind(dead, alive) ~ logconc + sex -
    + 1, family = binomial("logit"), data = d)
     >  summary(m4)

    Call:
    glm(formula = cbind(dead, alive) ~ logconc + sex - 1, family = binomial("logit"),
     data = d)

    Deviance Residuals:
     Min 1Q Median 3Q Max
    -1.10540 -0.65343 -0.02225 0.48471 1.42944

    Coefficients:
     Estimate Std. Error z value Pr( > |z|)
    logconc 1.0642 0.1311 8.119 4.70e-16 ***
    sexf -3.4732 0.4685 -7.413 1.23e-13 ***
    sexm -2.3724 0.3855 -6.154 7.56e-10 ***
    ---
    Signif. codes: 0***0.001**0.01*0.05 ‘.’ 0.1 ‘ ’ 1

    (Dispersion parameter for binomial family taken to be 1)

     Null deviance: 126.227 on 12 degrees of freedom
    Residual deviance: 6.757 on 9 degrees of freedom
    AIC: 42.867

    Number of Fisher Scoring iterations: 4

We can now pass these coefficients to dose.p(). As before, the LC50’s are reported in the $log_{2}$ concentration scale. We will backtransform these to the original units. For males, we have:

     >  lc50.males = dose.p(m4, c(3, 1), 0.5)
     >  lc50.males

     Dose SE
    p = 0.5: 2.229262 0.2259649

     >  2^lc50.males[1]

    p = 0.5:
    4.688941

And for females, we have:

     >  lc50.females = dose.p(m4, c(2, 1), 0.5)
     >  lc50.females

     Dose SE
    p = 0.5: 3.263587 0.2297539

     >  2^lc50.females

     Dose SE
    p = 0.5: 9.60368 0.2297539

Like the odds ratios, the LC50 values illustrate the degree to which males are much more sensitive than females. The model predicts that it takes a lower dose to kill 50% of the males (4.69 ppb as opposed to 9.6 ppb). For the sake of completeness, we can also use anova() to test for overall effects.

     >  anova(m4, test = "Chisq")

    Analysis of Deviance Table

    Model: binomial, link: logit

    Response: cbind(dead, alive)

    Terms added sequentially (first to last)


     Df Deviance Resid. Df Resid. Dev P( > |Chi|)
    NULL 12 126.227
    logconc 1 20.621 11 105.606 5.598e-06
    sex 2 98.849 9 6.757 3.429e-22

This ANODEV table is analogous to the traditional ANCOVA table of the general linear model (see section). The p-value for logconc indicates a significant effect of concentration, and the p-value for sex indicates significant differences among sexes, after controlling for logconc.

There are a number of ways we could plot the model predictions. For example, we could do it using code similar to that used in Fig.1.3 for males and females separately. Here, let’s use curve(). This will take any function that is passed to it, evaluate it along a continuum, and plot the results as a curve. So, before we can use it, we must create a function that will give model predictions. We will do this for the males (call it pm), and females (pf). We will also add a legend using legend().

     >  d$p = d$dead/(d$dead + d$alive)
     >  par(mar = c(5, 5, 4, 2))
     >  plot(d$logconc, d$p, col = as.numeric(d$sex),
    + xlab = "log(Concentration(ppb))", ylab = "Proportion",
    + pch = 16, cex.lab = 1.5, cex.axis = 1.5)
     >  pm = function(x) predict(m4, newdata = data.frame(logconc = x,
    + sex = as.factor("m")), type = "response")
     >  pf = function(x) predict(m4, newdata = data.frame(logconc = x,
    + sex = as.factor("f")), type = "response")
     >  curve(pm, 0, 5, add = T, col = "red")
     >  curve(pf, 0, 5, add = T)
     >  legend(0, 1, legend = c("Males", "Females"), col = c("red",
    + "black"), lty = 1, pch = 16, bty = "n")

Figure 1.4: Scatterplot showing the logit model fit of males and females.