

Sum of 
df 
Mean Square 
F 
Sig 

Between Groups 
12046.467 
3 
4015.489 
6.226 
.001 
c) We can do exactly the same analysis with regression.
We do it with what are called Dummy Variables, which are
variables that indicate off/on. Off=0, On =1. We will use
three of them, which are labeled fc (Franklin Central), man
(Manchester) and np (New Prairie). How does the ANOVA table
you get here compare with the ANOVA table in part a?
(They are different because in part a we are
analyzing rank or how they placed, and below we are
analyzing their running time in seconds.)
What does the Rsquared tell us?
According to the regression coefficients, girls from which
semistate, FC, NP, MAN, or TerraHaute (which is not listed
separately) are slowest?
R SQUARE = .247 
ADJUSTED R SQUARE = .207 
Sum of 
df 
Mean Square 
F 
Sig 

Regression 
15075.650 
3 
5025.217 
6.136 
.001 
Regression 
StdError 
t 
Sig 

Constant 
933.467 
7.389 
126.327 
.000 
d) If we look at the scatter diagram of state times and semistate times, we get the picture below. If we look at the correlation, we get the correlation below. What do they tell us?
Correlation between state time and semistate time = .628; significance is .000 (or less than 1 in 1000).
e) If we try to explain the time that a girl ran in the state meet using with regression using her time in the semistate meet as the independent variable, we get the following. How well does the semistate time predict the state time? What would we predict for a girl who ran a semistate time of 900 seconds (that is 15 minutes)? Do we see regression toward the mean?
R SQUARE = .395 
ADJUSTED R SQUARE = .384 
Sum of 
df 
Mean Square 
F 
Sig 

Regression 
24059.334 
1 
24059.334 
37.836 
.000 
Regression 
StdError 
t 
Sig 

Constant 
243.037 
109.121 
2.227 
.030 
f) We might do better than this if we also include the variables that tell which semistate a runner came from. Those results are below. By how much has our rsqaured increased? Which semistate has the fastest course? Which seems to have the slowest course? What time would we predict for a girl who runs 900 seconds in the NP semistate?
R SQUARE = .736 
ADJUSTED R SQUARE = .716 
Sum of 
df 
Mean Square 
F 
Sig 

Regression 
44830.269 
4 
11207.567 
38.262 
.000 
Regression 
StdError 
t 
Sig 

Constant 
162.846 
1008.865 
1.496 
.140 
3. Two alumni of the University of Wisconsin wrote a research paper trying to explain variations in salary levels of 45 members of the University of Wisconsin economics department. Here were their published results:
REGRESSION OF FACULTY SALARY LEVELS OF EXPERIENCE, PUBLISHING PERFORMANCE, TEACHING PERFORMANCE AND ADMINISTRATIVE DUTIES 

Independent Variables 
Coeffient (in dollars) 
Error (in dollars) 

Experience 
253.28 
59.71 
4.24* 
Monographs 
5.72 
162.01 
0.04 
Articles in National Journals 
392.46 
90.64 
4.33* 
Articles in Specialty Journals 
344.59 
90.45 
3.81* 
Other Publications 
76.49 
24.31 
3.15* 
Transformed Teaching Score 
7.31.67 
429.82 
1.70 
Administrative Duties 
5,208.90 
807.46 
6.45* 
Intercept 
12,127.10 

*Significant at the .01 level 





Source: American Economic Review, May 1973 (Vol. 63 No. 2) p 313 
4. Several different groups attempt to measure how conservative or liberal congressmen are. Among these groups were the Americans for Democratic Action (ADA), the AFLCIO Committee on Political Education (COPE), the National Farm Union (NFU), and the Americans for Constitutional Action (ACA) Below is a matrix showing correlations between ratings given by these various interest groups many years ago. (Data from Journal of Law and Economics, Dec. 1979, p 369.)


Groups: 




ADA 

COPE 


NFU 



ACA 




*Significant at the .01 level. 
Here is a regression that tried to explain the ADA rating of over 400 congressmen by looking at the party of the congressman (1 = Democrat, 0 = Republican) and whether the congressman was from the North (= 0) or South (= 1)
ADA = 



Corrected R^{2}=.55 



5. Below are some data for used Cadillacs from several years ago. In addition to the price of the car, they include the age of the car, the year of the car, and the number of miles the car has. Using regression to see how price depended on age and miles (Miles are measured in thousandsi.e. 1 = 1k), gives the results below:
Model Summary: Predictors: (Constant), AGE, MILES; Dependent Variable: PRICE
R 
R Square 
Adjusted R Square 
Std. Error of the Estimate 
.928 
.861 
.852 
3010.4572 
ANOVA 
Sum of Squares 
df 
Mean Square 
F 
Sig. 
Regression 
1742354183.829 
2 
871177091.915 
96.126 
.000 
Residual 
280948430.641 
31 
9062852.601 

Total 
2023302614.471 
33 
Unstandardized Coefficients 
Standardized Coefficients 
t 
Sig. 

B 
Std. Error 
Beta 

(Constant) 
28213.880 
1185.683 
23.795 
.000 

MILES 
117.329 
24.242 
.402 
4.840 
.000 
AGE 
1125.172 
148.238 
.631 
7.590 
.000 
