Chi Square

The Chi-Square Test

The Chi-square test is a non-parametric test. It is very useful when we cannot measure characteristics, but we can classify them.

Suppose, for example, we are interested in how men and women view something that comes in three varieties, which for lack of creativity we will call A, B, and C. We have taken a sample of 100 males and 100 females. Sixty of them prefer A, sixty prefer B, and 120 prefer C. The question is, is there a real difference between how males and females view this item? If there is no difference, we would expect to find something close to this table if we tabulate the results:

Expected

Male

Female

Total

Item A

30

30

60

Item B

30

30

60

Item C

40

40

80

Totals

100

100

200

If there are no difference, we are unlikely to get exactly the table above. By random chance it is very likely that our sample will give results that are a bit different from the expected results, just as when we flip a fair coin 100 times we most often end up close to 50 heads but not at exactly 50 heads.

Suppose we get the results below. Are they close enough to what we expect to get for us to say the differences look like they could be due to random chance, or are they different enough for us to conclude that random chance does not look like a good explanation of the differences?

Actual

Male

Female

Total

Item A

25

35

60

Item B

25

35

60

Item C

50

30

80

Totals

100

100

200

The Chi-square statistic allows us to compute the probability that we would get a sample result like the one we got by random chance if the claim that the classification categories do not matter. Usually we let a statistical program compute it, and you can even find web pages that will do this for you. (See, for example, http://people.ku.edu/~preacher/chisq/chisq.htm.) If you want to do it without a calculator, begin by finding the differences between the expected and observed for all the cells. Then for each cell, square that difference and divide by the expected number of occurrences. Finally, add together all these numbers, and you have the Chi-square statistics.

For our problem, the first five cells all differ from the expected count by five and the expected count in each of them is 30. So we have four results of 25/30. The last two cells differ from the expected count by 10, and the expected count is 40, so we have two results of 100/40. Adding these six results gives us 8.3333.

We need to take this result to a Chi-square table, where we will find that we need one more number, the degrees of freedom. If we look at our table and take the totals as given, how many of the interior cells do we need to fill in order to figure out the rest? It may surprise you that we need only two. If we know the preferences of females for A and B, we can figure out the rest. Hence, there are only two degrees of freedom. More generally, the degrees of freedom are the number of rows less one multiplied by the number of columns less one.

A Chi-square calculator will tell us that the probability that we would get a result this different from what we expected to find 1.55% of the time. Hence, if we want strong proof that the claim is wrong, say an alpha value of 5%, we would reject the claim that there are no difference by sex and argue that the evidence suggests that there are differences. However, if we want really strong evidence, say and alpha value of only 1%, this result is not strong enough. We will give the benefit of the doubt to the claim that there are no differences by sex because we could results like the one we obtained as much as 1.55% of the time.

Start

Problems