Back to Problems

1. In real estate, location is very important. Is it as important in politics? We can ask that by using Chi-Square analysis to look at the relationship of presidential winner and state location. There are 26 states east of the Mississippi River and 24 west of it. If we do a cross-tabulation of the location of states and presidential winner in the 2000 election, we get the following tables.

 WINNER * WHERE Crosstabulation Count WHERE:: WINNER: east west Total bush 13 17 30 gore 13 7 20 Total 26 24 50

Chi-Square Test
 Value df Asymp. Sig. (2-sided) Pearson Chi-Square 2.257 1 .133 N of Valid Cases 50
Computed only for a 2x2 table
0 cells (.0%) have expected count less than 5. The minimum expected count is 9.60.

a) If we compute the expected number of eastern states that Bush should have won if geography has no influence, how many do we get?
b) Are these results statistically significant? Explain carefully in a paragraph that someone who has not had statistics would understand.

a) 15.6, found by multiplying 30 by 26 and dividing by 50. (Bush won 60% of the states (30/50). If geography makes no difference, he would have won 60% of the eastern states. 60% of 26 is 15.6.)
b) Not at usual levels. If location does not matter, we would get results as different from the expected values over 13% of the time. So random chance is a plausible explanation for the differences from the expected breakdown.

2. A researcher has surveyed a large number of high schools trying to determine their attitudes toward college. One question is whether or not they have ever heard of Saint Joseph's College. The researcher suspects that as students advance through high school, they learn more about colleges, and hence seniors should be more likely to have heard of Saint Joseph's College than freshmen. She decides to test this hypothesis using Chi-Square. Below are the results she gets:

QUES7 * YEAR Crosstabulation

 YEAR Total 2003 2004 2005 2006 QUES7 1 Count 143 87 42 10 282 Expected Count 132.7 83.7 49.4 16.1 282.0 2 Count 120 79 56 22 277 Expected Count 130.3 82.3 48.6 15.9 277.0 Total Count 263 166 98 32 559 Expected Count 263.0 166.0 98.0 32.0 559.0

Chi-Square Tests

 Value df Asymp. Sig. (2-sided) Pearson Chi-Square 8.853 ____ ____ N of Valid Cases 559
0 cells (.0%) have expected count less than 5. The minimum expected count is 15.86.

(For this question, an answer of 1 indicates that the student has heard of SJC, while a 2 indicates they the student has not heard of SJC. Year represents year of graduation, so 2003 is a senior and 2006 is a freshman.)

a) The first cell of the table indicates that 143 seniors have heard of SJC. Where does the expected count of 132.7 come from?
b) Just eyeballing these data, do they tend to support the researcher's hypothesis? Explain.
c) To formally test the hypothesis, what does the researcher establish as the null hypothesis? What is the alternative?
d) The results indicate a Chi-Square statistic of 8.853. Set up the equation that yielded this result. (Put in the numbers that you would begin with--no need to do any of the calculation.)
e) How many degrees of freedom would this test have? Explain.
f) If you were to test the null hypothesis with alpha = .05, what would you decide? Explain.
g) If you were to test the null hypothesis with alpha = .01, what would you decide? Explain.

The degrees of freedom are 3. The level of significance is .031
a) 263*282/559
b) The percentage of students who have heard of the college declines as they get further away from graduation. So, yes, eyeballing the data suggests the researcher's suspicion may be correct. However, it is not clear if the trend is strong enough to be more than random.
c) The claim or null hypothesis, which we want to show is incorrect, is that year in school has no impact on whether students have heard of the college. The alternative is that the year in school matters.
d) (143-137.2)2/137.2 + (120 - 130.3)2/130.3 + etc.
e) 3. (number of row -1)*(number of columns - 1) = 1*3 = 3
f) You would reject the claim that the results are random and say that the results show that year in school influences how aware the students are of the college.
g) You would say that the evidence is not strong enough to reject the claim that the year of students matters in their awareness of the college.

3. A political candidate has a survey done to determine how popular he is with various groups. He finds the following:

 Age: Preference: over 65 under 65 Total support him 18 12 30 oppose him 22 48 70 Total 40 60 100
a) Complete a contingency table showing expected frequencies under the null hypothesis that there is no difference in support across age groups.

 Age: Preference: over 65 under 65 Total support him _____ _____ 30 oppose him _____ _____ 70 Total 40 60 100

b) How many degrees of freedom will we have in the Chi-square test? Explain how you find out.
c) If he wants to test the hypothesis that he is equally supported in different age groups and he sets his alpha level to .05, what is the critical value of the chi-square statistic?
d) If he sets alpha to .01, what is the critical value of the chi-square statistic? What does this value mean?
e) Compute the chi-square statistic in this case.
f) Do you accept or reject the null hypothesis at the .01 level? What does your decision mean?

a) Missing numbers in the first row: 12, 18. In the second row: 28, 42.
b) 1. (2-1)*)2-1) = 1.
c) 3.84, found in a table.
d) 6.64.
e) 7.143
f) Reject the claim that age does not matter. The probability that we get a result like this by random chance if there are no differences by age is only .00752.

4. A statistics professor believes that students in his morning classes do better than students in his afternoon classes. His department chairman says the differences are random and that students do equally well regardless of time. The professor finds his grades for the past semesters and finds the following:

 Letter Grade: Time the Classes Met: A B C D F Total Morning 12 18 13 11 6 60 Afternoon 3 7 17 4 9 40 Total 15 25 30 15 15 100

Using the Chi-square test, do you conclude that the morning classes and afternoon classes were somehow different?

The Chi-square statistic is 11.083 with 4 degrees of freedom. The p-value or level of significance is .02564. If we want just strong evidence, setting our cut-off level of significance at .05, then the evidence is enough to conclude that the morning classes are different from the afternoon classes. However, if we wanted overwhelming evidence, setting the cut-off level of significance at .01, then this evidence is still not enough to reject the claim that there is no real difference in the classes. By random chance we would get a result this far from the expected values about 2.5% of the time.

Back to Problems