|
|
Sum of |
df |
Mean Square |
F |
Sig |
|
Between Groups |
12046.467 |
3 |
4015.489 |
6.226 |
.001 |
c) We can do exactly the same analysis with regression.
We do it with what are called Dummy Variables, which are
variables that indicate off/on. Off=0, On =1. We will use
three of them, which are labeled fc (Franklin Central), man
(Manchester) and np (New Prairie). How does the ANOVA table
you get here compare with the ANOVA table in part a?
(They are different because in part a we are
analyzing rank or how they placed, and below we are
analyzing their running time in seconds.)
What does the R-squared tell us?
According to the regression coefficients, girls from which
semistate, FC, NP, MAN, or TerraHaute (which is not listed
separately) are slowest?
R SQUARE = .247 |
ADJUSTED R SQUARE = .207 |
Sum of |
df |
Mean Square |
F |
Sig |
|
Regression |
15075.650 |
3 |
5025.217 |
6.136 |
.001 |
Regression |
StdError |
t |
Sig |
|
Constant |
933.467 |
7.389 |
126.327 |
.000 |
d) If we look at the scatter diagram of state times and semistate times, we get the picture below. If we look at the correlation, we get the correlation below. What do they tell us?
Correlation between state time and semi-state time = .628; significance is .000 (or less than 1 in 1000).
e) If we try to explain the time that a girl ran in the state meet using with regression using her time in the semi-state meet as the independent variable, we get the following. How well does the semi-state time predict the state time? What would we predict for a girl who ran a semistate time of 900 seconds (that is 15 minutes)? Do we see regression toward the mean?
R SQUARE = .395 |
ADJUSTED R SQUARE = .384 |
Sum of |
df |
Mean Square |
F |
Sig |
|
Regression |
24059.334 |
1 |
24059.334 |
37.836 |
.000 |
Regression |
StdError |
t |
Sig |
|
Constant |
243.037 |
109.121 |
2.227 |
.030 |
f) We might do better than this if we also include the variables that tell which semistate a runner came from. Those results are below. By how much has our r-sqaured increased? Which semistate has the fastest course? Which seems to have the slowest course? What time would we predict for a girl who runs 900 seconds in the NP semistate?
R SQUARE = .736 |
ADJUSTED R SQUARE = .716 |
Sum of |
df |
Mean Square |
F |
Sig |
|
Regression |
44830.269 |
4 |
11207.567 |
38.262 |
.000 |
Regression |
StdError |
t |
Sig |
|
Constant |
-162.846 |
1008.865 |
-1.496 |
.140 |
3. Two alumni of the University of Wisconsin wrote a research paper trying to explain variations in salary levels of 45 members of the University of Wisconsin economics department. Here were their published results:
REGRESSION OF FACULTY SALARY LEVELS OF EXPERIENCE, PUBLISHING PERFORMANCE, TEACHING PERFORMANCE AND ADMINISTRATIVE DUTIES |
|||
Independent Variables |
Coeffient (in dollars) |
Error (in dollars) |
|
Experience |
253.28 |
59.71 |
4.24* |
Monographs |
-5.72 |
162.01 |
-0.04 |
Articles in National Journals |
392.46 |
90.64 |
4.33* |
Articles in Specialty Journals |
344.59 |
90.45 |
3.81* |
Other Publications |
76.49 |
24.31 |
3.15* |
Transformed Teaching Score |
7.31.67 |
429.82 |
1.70 |
Administrative Duties |
5,208.90 |
807.46 |
6.45* |
Intercept |
12,127.10 |
||
*Significant at the .01 level |
|||
|
|
|
|
Source: American Economic Review, May 1973 (Vol. 63 No. 2) p 313 |
4. Several different groups attempt to measure how conservative or liberal congressmen are. Among these groups were the Americans for Democratic Action (ADA), the AFL-CIO Committee on Political Education (COPE), the National Farm Union (NFU), and the Americans for Constitutional Action (ACA) Below is a matrix showing correlations between ratings given by these various interest groups many years ago. (Data from Journal of Law and Economics, Dec. 1979, p 369.)
|
||||
Groups: |
|
|
|
|
ADA |
||||
COPE |
|
|||
NFU |
|
|
||
ACA |
|
|
|
|
*Significant at the .01 level. |
Here is a regression that tried to explain the ADA rating of over 400 congressmen by looking at the party of the congressman (1 = Democrat, 0 = Republican) and whether the congressman was from the North (= 0) or South (= 1)
ADA = |
|
|
|
Corrected R2=.55 |
|
|
|
5. Below are some data for used Cadillacs from several years ago. In addition to the price of the car, they include the age of the car, the year of the car, and the number of miles the car has. Using regression to see how price depended on age and miles (Miles are measured in thousands--i.e. 1 = 1k), gives the results below:
Model Summary: Predictors: (Constant), AGE, MILES; Dependent Variable: PRICE
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
.928 |
.861 |
.852 |
3010.4572 |
ANOVA |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
Regression |
1742354183.829 |
2 |
871177091.915 |
96.126 |
.000 |
Residual |
280948430.641 |
31 |
9062852.601 |
||
Total |
2023302614.471 |
33 |
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
||
B |
Std. Error |
Beta |
|||
(Constant) |
28213.880 |
1185.683 |
23.795 |
.000 |
|
MILES |
-117.329 |
24.242 |
-.402 |
-4.840 |
.000 |
AGE |
-1125.172 |
148.238 |
-.631 |
-7.590 |
.000 |
|