4.2.2 The Neyman-Pearson approach

In his correspondence with Neyman, Pearson began to question Fisher’s consideration of the p-value and the notion of significance. He was particularly disturbed by the temptation to use p-values to attempt to prove that a hypothesis was true (although Fisher himself never fell into this trap). Together, Neyman and Pearson collaborated on extending Fisher’s ideas, and their work really laid the foundation for the traditional statistical approaches that are prevalent in today’s research literature.

The Neyman-Pearson framework extends Fisher’s approach in two specific ways. First, Neyman and Pearson concluded that the test of any hypothesis really did not make sense in the absence of an explicit alternative that could be assumed to be correct if the original hypothesis was rejected. Thus, they defined the hypothesis that was actually being tested the null hypothesis, denoted $H_{o}$. The alternative hypothesis was denoted $H_{a}$. A small p-value, hence a significant hypothesis test, meant that $H_{o}$ was rejected in favor of $H_{a}$.

Second, instead of relying on some post hoc determination of the significance of the p-value (i.e., how small does the p-value need to be?), Neyman and Pearson argued that a critical probability level, which could be used to establish whether an observed p-value was significant, must be set prior to conducting the test. This cut-off value is called $\alpha $. If $p \leq \alpha $, then the result of the hypothesis test is significant, and $H_{o}$ can be rejected in favor of $H_{a}$. If $p > \alpha $, then you fail to reject $H_{o}$. Notice that in this framework, the p-value is not really viewed as a measure of the strength of evidence against $H_{o}$ as it was in Fisher’s original framework. Given $\alpha $, $H_{o}$ is either rejected, or it is not.

Together, $H_{o}$ and $H_{a}$ must predict all possible values of the parameter of interest, and the relationship between the two can be illustrated using our example. We are interested in the hypothesis that the mean of the population is 5. From this, three separate pairs of hypotheses can be developed1.

$H_{o}: \mu \geq 5$

$H_{o}: \mu = 5$

$H_{o}: \mu \leq 5$

$H_{a}: \mu < 5$

$H_{a}: \mu \neq 5$

$H_{a}: \mu > 5$

Like in Fisher’s approach, to test any of these hypotheses, we focus on the sampling distribution of $\bar y$, assuming that $\mu = 5$. The critical region of this sampling distribution determines the values of an observed test statistic for which $H_{o}$ would be rejected. Its size is determined by $\alpha $ and its location is determined by $H_{a}$ (Fig.4.2).

(a) $H_{a}:\mu < 5$
(b) $H_{a}:\mu \neq 5$
(c) $H_{a}:\mu > 5$

Figure 4.2: In the Neyman-Pearson approach to hypothesis testing, $H_{o}$ is compared to a specific alternative hypothesis. Here, the sampling distribution of $\bar y$ is shown with the possible alternative hypotheses illustrated. In all cases, the shaded region indicates $\alpha $, which was set at 0.05. Its location is determined by $H_{a}$. In this example, $H_{o}$ is rejected (i.e., the result is significant) only when $H_{a}$ is $\mu > 5$. Thus, the specification of $H_{a}$ is an important step when proceeding with a null hypothesis test.

In Figs.4.2a and 4.2b, the observed test statistic $\bar y = 6.6$ does not fall in the critical region. As a result, both of those possible hypothesis tests would be nonsignificant. However, in Fig.4.2c, because the observed test statistic falls in the critical region, $H_{o}$ would be rejected in favor of $H_{a}$, and we would conclude that $\mu > 5$. Thus, the location of $H_{a}$ is critically important, and can change the overall interpretation of results!

The graphical approach to comparing the location of the observed test statistic to the critical region is equivalent to comparing the p-value to $\alpha $. This is easily accomplished for each of the possible sets of hypotheses. For the first set of hypotheses, because of the direction of $H_{a}$ the p-value falls in the left tail of the sampling distribution.

     >  Ha1.p = pnorm(6.6, mean = 5, sd = 3/sqrt(10))
     >  Ha1.p

    [1] 0.9541549

For the second set, $H_{a}$ considers both tails of the distribution. Here, we have to consider the fact that our sampling distribution is symmetric. As a result, the p-value will always be double what is found in one tail.

     >  Ha2.p = 2 * (1 - pnorm(6.6, mean = 5, sd = 3/sqrt(10)))
     >  Ha2.p

    [1] 0.09169028

Finally, the last $H_{a}$ considers the right tail.

     >  Ha3.p = 1 - pnorm(6.6, mean = 5, sd = 3/sqrt(10))
     >  Ha3.p

    [1] 0.04584514

Again, when comparing $p$ to $\alpha $, the result depends on $H_{a}$. In only the last set of hypotheses, $H_{o}$ would be rejected in favor of $H_{a}$. This is the reason that the Neyman-Pearson approach emphasizes the importance of specifying $H_{a}$.

When cast as a series of steps, the Neyman-Pearson approach can be described as:

  1. generate a biological hypothesis

  2. translate the biological hypothesis into a statistical hypothesis, frequently done in terms of a parameter

  3. gather data

  4. calculate the observed test statistic

  5. calculate $p$

  6. make a statistical conclusion based upon comparing $p$ to $\alpha $

  7. draw a biological conclusion about the original biological hypothesis of interest

These steps form the foundation of what is sometimes referred to as classical null hypothesis testing.

  1. In some cases, you may see all three sets of hypotheses listed with the same $H_{0}:\mu =5$. However, to be technically correct, and to ensure that the combination of $H_{o}$ and $H_{a}$ delineate all possibilities, $H_{o}$ should be listed as shown.