Answers: ANOVA

1. How important is the make of car in auto racing? This is a problem that can be solved with Analysis of Variance. We are testing the hypothesis that the chances of winning are unaffected by the make of the car. To do this we get a sample of drivers using Fords, a sample of drivers using Chevys, and a sample of drivers using Pontiacs and see how many top ten finishes they have. If the make of car does not matter, the average number of wins in each group should be fairly close.

Here are the ANOVA results from such a study:

Model

Sum of Squares

degrees
of freedom

Mean
Square

F-statistics

Significance

Regression:
Residual:
Total:

100.888
2540.772
2641.660

2
44
48

50.444
57.745

.874

.425

a) How big was the sample? How can you tell?

b) What does this result tell you? Does it convince you that make of car does matter? Does it prove that make of car does not matter? Carefully explain what it tells you and why it tells you that. (Hint: the null hypothesis is that the make of car does not matter.)

a) 49. Degrees of freedom for the total are n-1.
b) There is no evidence that make of car matters. The differences in the results we got were small enough to be nothing more than random noise. If there is no difference at all, we could have gotten a result that explains as much as this one does 42.5% of the time by random chance.

2. Every fall 20 teams from around the state of Indiana would meet in the cross-country state championships. Some coaches maintain that the four semi-state regions, each of which contributes 5 teams to the state finals, are not equal in talent. They argue that a 6th or 7th place team in a strong semi-state final would easily go to the state finals if they could compete in a weak semi-state final. (The data for this problem was collected before the rules changed and six teams were allowed to advance.)

a) In the graph below we have taken the final placing of the twenty women's teams that made it to the state finals and grouped them by the semi-state that they came from. You can see that the first place team came from the group 2, and the 20th place team came from group 4. Based on this graph, which groups look weak and which look strong? Does the contention of the coaches mentioned above look like it may have substance? Explain.

Groups one and two look stronger than groups three and four; the coaches' contention looks like it might be valid.

b) Ultimately, we need to do a statistical analysis to see what we can determine. The null hypothesis will be that all the semi-states are equally strong. We can do an Analysis of Variance test on the rank (place 1 to 20) or on the points scored (in cross country, like golf, fewer points are better). Based on the Analysis of Variance results, should we accept the hypothesis that all semi-state regions are equally strong, or should we reject it and decide that the coaches in the introduction are right? Explain.

ANOVA

Sum of Squares

df

Mean Square

F

Sig.

RANK

Between Groups

341.400

3

113.800

5.627

.008

Within Groups

323.600

16

20.225

Total

665.000

19

Points

Between Groups

153059.800

3

51019.933

6.630

.004

Within Groups

123126.400

16

7695.400

Total

276186.200

19

It appears that the coaches are correct. We would get results like what we are seeing by random chance if the semi-states were all equal less than 1% of the time.

c) If we try to predict how well a team does, we can use regression with final rank as the dependent variable and as independent variables semi-state rank plus a variable to indicate in which semi-state the team ran. Valparaiso finished first in the NP semi-state. What rank do we predict for it in the state meet?

8.450 - 9.2 + 2.250 = 1.5. We predict that they would finish first or second.

d) Penn High School finished sixth in the NP semi-state, and did not go on to the state meet. If they had been allowed to go, where does this regression predict they would have finished?

8.450 - 9.2 + 2.250 *6= 12.75. We predict that they would have finished about thirteenth.

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.904

.818

.769

2.8414

a Predictors: (Constant), SEMIRANK, MAN, FC, NP

Coefficients

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

Model

B

Std. Error

Beta

1

(Constant)

8.450

1.852

4.562

.000

NP

-9.200

1.797

-.691

-5.120

.000

FC

-8.400

1.797

-.631

-4.674

.000

MAN

-1.200

1.797

-.090

-.668

.514

SEMIRANK

2.250

.449

.552

5.008

.000

a Dependent Variable: RANK

3. After running a regression, I received the following Analysis of Variance results:

Source

Sum of Squares

Deg Freedom

Mean Square

F

Regression

8.70

4

2.18

7.18

Residuals

6.97

23

.30

Total

15.67

27

.58

a) What would the R² be for this regression?

b) How could we tell if it is what we would expect to get by random chance, or if it is bigger than what we would expect to get by random chance?

a) R-square = 8.7/15.67 = .555
b) We need the significance of the F statistics.

Back to Problems

Start . Text