ANOVA

ANOVA: Analysis of Variance

In the introduction to hypothesis testing, we explained the t-test procedure of testing a claim for a population mean using the sample mean. A more complicated version of the t-test allows one to test the claim that two groups or treatments have the same mean. For example, we could use a t-test to test the claim that men and women have the same average weight. (You can find information about this test in some introductory statistics texts.)

Suppose, however, we have three or more treatments or groups and we want to know if there are effective or different. For this kind of problem statisticians have developed Analysis of Variance, or ANOVA. The intuition behind the procedure is to compare the variation about the common mean to the variation about the individual means. If the variation is not much different, the different groups do not matter. If the difference is great, the different groups probably are different.

A simple example will help illustrate what the procedure does. Suppose samples from three groups. The sample values of group A are 0, 4, and 5 and these have a mean of 3. The sample values of group B are 1, 2, and 6 and these have a mean of 3. The sample values of group C are 3, 7, 8 and these have a mean of 6. If we combine all the numbers, the mean is 4.

The table below shows that if we subtract 4 from each of the 9 numbers and square the results, then sum them, we get a total of 60. This is called the total sum of squares. If we subtract the group means from each member of the groups, we find that the sum of squares is 42. This is the amount of variation that is not accounted for by the different groups. If follows that the amount of variation that is explained by the groups will be 18, and we can also get this by subtracting the group means from the overall mean, squaring the results, and then summing.

Group

Values

(observed - overall mean)²

(observed - group mean)²

(group mean - overall mean)²

A

0

16

9

1

A

4

0

1

1

A

5

1

4

1

B

1

9

4

1

B

2

4

1

1

B

6

4

9

1

C

3

1

9

4

C

7

9

1

4

C

8

16

4

4

Sums:

36

60

42

18

The question at this point is whether this is the sort of result we would get by a random assignment of items to the various groups or whether it appears that the results seem to be different from what we would usually get by a random assignment. To find out, we compute the F statistics, and then using tables or an F-statistic calculator, find out the probability of getting a result like the one we have by random chance. (This is, of course, called either a p-value or a level of significance.)

ANOVA tables are computed as part of regression analysis, which is the primary reason they are included here. They show how much of the original variation in the numbers (the sum of squares around the mean) the regression equation explains and how much of that variation is left unexplained. The R-square is computed from the ANOVA results of the regression, and the level of significance that one should attach to it is given by the significance of the F-statistic.

Start

Problems