# 11.3 The probit model

Before going on to look at response variables with other error
distributions, it is worth briefly mentioning another link function that
is sometimes used for proportion data - the “probit”. The probit (i.e.,
*prob*ability un*it*), is defined as:

\begin{equation} probit(p) = \Phi ^{-1}(p) \end{equation}

where $\Phi ^{-1}$ is the inverse of the cumulative normal
distribution (CDF). Probits can be thought of as expressing the
proportions as the number of standard deviations from the mean of a
normal pdf, or, more specifically, they are the quantiles that could be
obtained by using `qnorm()`

.

The use of probits was first proposed in an influential paper by [, and they have a long history in toxicological research. In most cases, results when using the probit link function will be nearly identical to those when using the logit. However, the probit does not perform as well as the logit in cases where the data tend to be sparse at intermediate levels of the predictor variable. For example, consider the following data on fathead minnow survival when exposed to six different concentrations of sodium pentachlorophenol (NaPCP). The data were originally collected by Weber and colleagues and appear as an example in [18]. At each concentration, four tanks of ten minnows were used. The data are stored in a file called “newman_example5_1.csv”.

> str(d)

'data.frame': 24 obs. of 3 variables:

$ rep : int 1 2 3 4 1 2 3 4 1 2 ...

$ conc : int 0 0 0 0 32 32 32 32 64 64 ...

$ prop_surv: num 1 1 0.9 0.9 0.8 0.8 1 0.8 0.9 1 ...

The variable `rep`

denotes the replicates, `conc`

is the concentration
of NaPCP, and `prop_surv`

is the proportion of individuals that survived
in each tank. Let’s take a look at the proportion of individuals that
died. We will call this `y`

. To fit the probit, we just specify
`link = ’probit’`

as an argument within the `family = binomial()`

argument. Also notice that if the link function is not specified (as in
the statement for `m2`

below), the default link is the logit.

> m = lm(y ~ log(conc + 1), data = d)

> m2 = glm(y ~ log(conc + 1), family = binomial,

+ data = d, weights = rep(10, 24))

> m3 = glm(y ~ log(conc + 1), family = binomial(link = "probit"),

+ data = d, weights = rep(10, 24))

> plot(d$y ~ log(d$conc + 1), xlab = "Ln(Conc+1)",

+ ylab = "Proportion Dead", pch = 16)

> abline(m, lwd = 2)

> newdata = data.frame(conc = seq(0, 512, by = 0.1))

> lines(log(newdata$conc + 1), predict(m2, newdata,

+ type = "response"), col = "blue", lwd = 2)

> lines(log(newdata$conc + 1), predict(m3, newdata,

+ type = "response"), col = "red", lwd = 2)

> legend(0, 0.8, c("Linear", "Logit", "Probit"),

+ col = c("black", "blue", "red"), lty = c(1,

+ 1, 1), lwd = c(2, 2, 2))

In these data (Fig.1.5), there is a relative gap between the controls and the lowest dose. As a result, the probit model tends to overestimate mortality at intermediate doses and underestimate mortality at higher doses, when compared to the logit model (although the differences are slight). Because of such differences, the logit model can be considered a more robust alternative to the probit.