4.3.2 Decision errors and power

Under the NHT paradigm, one can explicitly consider four possible outcomes of a hypothesis test (Table ??).

  1. The investigator can reject $H_{o}$ when it is in fact true. This is called a Type I error.

  2. The investigator can reject $H_{o}$ when it is in fact false. This decision is correct, and the probability that this occurs is called power.

  3. The investigator can fail to reject $H_{o}$ when it is in fact false. This is called a Type II error.

  4. The investigator can fail to reject $H_{o}$ when it is in fact true. This decision also is correct.

The expected probabilities of each of these outcomes typically are considered when planning an experiment. In particular, the two possible errors deserve consideration, and much of Neyman and Pearson’s work described how to quantify these error rates and design experiments to minimize their influence.

Table 4.1: The relationship between statistical decisions and reality. Decision errors are expected to occur, and information about the biological question of interest should determine which decision error (Type I or Type II) is more egregious.

    Statistical Decision
Real World   Reject $H_{o}$ Fail to Reject $H_{o}$
$H_{o}$ is True   Type I error ($\alpha $) Correct ($1-\alpha $)
$H_{o}$ is False   Correct (Power = $1-\beta $) Type II error ($\beta $)

The Type I error rate is really just the cut-off value $\alpha $ that was defined previously. If we assume that $H_{o}$ is true, then the pdf defined by $H_{o}$ (i.e., the sampling distribution of our test statistic assuming that $H_{o}$ is true) describes the likelihood of all possible values of the test statistic. If we were to repeatedly sample the population, how often will we get test statistics that fall in the rejection region, even when $H_{o}$ is true? $\alpha $% of the time. Thus, the Type I error rate is set by the investigator and reflects the probability that the null hypothesis is incorrectly rejected. Another way of thinking about this is, if the experiment were to be repeated a large number of times, and assuming that $H_{o}$ is true, then $\alpha $% of the time we would expect to obtain a test statistic that falls within the rejection region leading us to incorrectly reject $H_{o}$.

To avoid making such an error, you might be tempted to set $\alpha $ very low. However, as you decrease the probability of a Type I error, you tend to increase the probability that you make a Type II error. This probability is called $\beta $. Type II errors reflect an inability to reject a null hypothesis that is false, and, as a result, they reflect an inability to detect a difference that actually exists.

These error rates are illustrated in Figure 4.4. To estimate $\beta $, you have to assume that some other competing hypothesis, call it $H_{1}$, is actually true. For example, in Section 4.2.2, we evaluated $H_{o}:\mu = 5$ against $H_{a}:\mu \neq 5$. In Figure 4.2b, to reject $H_{o}$ in favor of $H_{a}$, the observed test statistic had to be greater than 6.86 or less than 3.14. These two critical values, shown in red in Figure 4.4, delineate the critical region within $H_{o}$’s pdf.

Figure 4.4: Illustration of the concept of statistical power. In this example, $H_{o}:\mu = 5$ is being tested against $H_{a}:\mu \neq 5$. To calculate power, $H_{1}:\mu = 7$ is assumed to be true. Power is defined as the area under $H_{1}$’s pdf that falls within the critical region of $H_{o}$, and is shaded light blue. Theoretically, this would also include area in the left tail of $H_{1}$’s pdf, but it is so small that it cannot be seen. The probability of a Type II error, $\beta $, is shown in light red.

Assume that instead of $H_{o}:\mu = 5$, we know that $H_{1}:\mu = 7$. To calculate $\beta $, we will consider the PDF generated by $H_{1}$ (i.e., the sampling distribution of our test statistic assuming that $H_{1}$ is true) , shown in blue. The area under this curve that corresponds to values of the test statistic for which we would fail to reject $H_{o}$, shaded in light red, is $\beta $. For this example $\beta = $ 0.441. Assuming that $H_{1}$ is true, 44.1% of the time, we would incorrectly fail to reject $H_{o}$.

On the other hand, the region under this PDF that coincides with the rejection region of $H_{o}$’s PDF is power. It is the region shaded in light blue, and for this example, power = 0.559. Assuming that $H_{1}$ is true, 55.9% of the time, we would correctly reject $H_{o}$. Figure 4.4 also illustrates the relationship between $\beta $ and power. Specifically,

[ Power= 1-\beta . ]

Here, we have illustrated $\alpha $, $\beta $, and power using a simple example that considers the sampling distribution of the mean. The calculation of power (and $\beta $, for that matter) depended on the $H_{o}$ and $H_{a}$ being considered, the $H_{1}$ being considered, the shape of the sampling distribution (in this case, assumed to be normal), and the width of the sampling distribution (which, in the case of the mean, is influenced by sample size). Similarly, for other hypothesis tests, the details about how to calculate power will depend on the null and alternate hypotheses being tested, the test statistic that is being used, and the expected shape and width of its sampling distribution. In particular, the influence of sample size on power is often an important aspect of experimental design. These details will be considered in later chapters when discussing various hypothesis tests.