Hypothesis Testing

A criminal trial is a (non-statistical) example of hypothesis testing. In the U.S. a criminal case begins with a claim that the defendant is innocent. The prosecutor attacks this claim by presenting evidence to show that the claim of innocence is implausible.

When those who are in fact guilty are found guilty and when those who are in fact innocent are found not guilty, the system worked as it was intended to work. However, the system can fail in two ways. The innocent can be found guilty, which means that a true claim has been rejected. Or the guilty can be found innocent, which means that a false claim has been accepted. Statisticians even have names for these two mistakes, type I error and type II error.

True State:

Innocent Guilty

Decision:

Innocent
Correct Decision Type I Error

Guilty
Type II Error Correct Decision

Are you wondering what this has to do with statistics? Many years ago statisticians realized that central limit theory implied that they could use random samples to test claims about populations. One of the first places this method was applied with impressive results was in quality control in manufacturing. The production line was put on trial to see if it was innocent or guilty of error.

Some examples of quality control problems are given in the problems. To illustrate the procedure here, let us use a simpler example. Suppose that you want to test the claim that a coin is fair, that is, it is equally likely to show heads as it is to show tails. A way to test this is to flip it a number of times and see what happens.

Suppose we flip the coin 50 times. What do we expect to happen? It might not come up with 25 heads and 25 tails because of random chance. An outcome of 24 heads, for example, is quite likely even if the coin is fair. On the other hand, if only five heads show, we will reject the reject the claim that the coin is fair because this outcome is unlikely to happen even once in many millions of sequences of 50 tosses.

You should be wondering where we draw the cutoff line in deciding whether to accept the claim that the coin is fair or to reject the claim. The answer may frustrate you because it frustrates many people when they first encounter it: the cutoff depends on how much type I and type II error we are willing to accept. And to make the problem even more troubling, there is a tradeoff: if we want to decrease type I error, we must accept more type II error, and vise versa.

Let us return to the criminal trial example. Although people do not like to think about it, innocent people are convicted and guilty people are found not guilty, and we have no way of knowing how often these mistakes are made. (Notice that they that system never finds people innocent. A verdict of not guilty does not prove innocence; it merely means that the evidence was insufficient to reject the claim of innocence.) People have a hard time accepting that in making decisions with partial information, mistakes are inevitable. When an innocent person is found guilty, that person's life can be ruined, which is why guilt is supposed to require proof beyond a reasonable doubt. Prosecuting attorneys tell themselves that if the convicted person was innocent of the charge, he was probably guilty of something else. Judges give lighter sentences when the convicted person expresses remorse, and continued insistence on innocence can lead to longer sentences. Judges do not like dealing with the thought that they sometimes make mistakes with serious consequences.

In contrast, statistical decision making begins with the acceptance that we will make mistakes. Before we decide whether we accept or reject a claim, we must agree on how frequently we are willing to make type I error, that is, the probability that we will reject a claim when it is true. If, for example, we are willing to erroneously decide the coin is unfair when it is in fact fair 5% of the time, we will decide that the coin is unfair if we get 17 or fewer heads or tails in 50 tosses. (The probability of that happening with a fair coin is about 3.3%.) If we want stronger evidence, wanting to reject fair coins only 1% of the time, we will decide the coin is unfair only if we get 15 or fewer heads or tails in 50 tosses. Notice that by requiring more evidence, we make it easier for an unfair coin to remain undetected. Finally, if we want to be absolutely certain that a coin is unfair before we reject it, we will never decide the coin is unfair.

Notice that in both the court case and in statistical hypothesis testing, the claim is attacked. If you want to show that a new drug is effective, you must begin by assuming that it does not work, and they you try to demonstrate that that assumption is unreasonable.

Nothing can be proven with absolute certainty with statistics. Although you cannot eliminate error with statistics, you can reduce the cost of error. It is the ability of statistics to reduce cost of error that has made it an essential tool in practical applications as well in as in the pursuit of scientific knowledge.

Start

Problems

Computer Problems