Next

### Problems: Simple Regression

(It is useful to calculate a regression coefficient at least once. However, no one who is doing useful work does hand calculation --they use statistical programs. Here is a simple regression coefficient calculator from the Internet: http://www.easycalculation.com/statistics/regression.php.)

1. For the data below, compute regression coefficients a and b.

 Y X 5 7 9 4 10 1 3 10 8 3

Find the R2 for the least squares regression line that you found.

2. A regression is run using 100 observations to determine the relationship between price and the number of pages in a book. The regression yields this equation:

Price = 1.41 + 1.32(Number of pages)
a) What price does this equation predict for a book with 500 pages?
b) If the standard deviation of the regression coefficient for pages is .13, what is a 95% confidence interval for the true coefficient?

3. The regression equation for the numbers in the following table is Y = 8 + .5X. What is the standard error of estimate?

 X Y Predicted Y e2 4 9 5 11 10 13 6 10 5 12

4. Suppose we have run a regression with five observations and we have the following results:

 X error 5 -1 4 1 1 0 2 ? 0 ?

What are the last two values for the residuals? (Hint: They must sum to zero, and the correlation of the error terms and the independent variables must be zero.)

5. Two researchers were interested in what relationship, if any, existed between a teacher's teaching effectiveness (measured by student evaluations) and his/her research ability (measured by the number of books or articles published over a three year period). Taking a sample of 69, they obtained this result

Teaching Effectiveness = 387.22 + 3.137(Research Ability)
R2 = .155; t-value for the regression coefficient = 3.51

a) What does the coefficient on Research Ability tell you?
b) What does the R2 tell you?
c) You are given a t-value. What does it mean?
d) It is possible to find the correlation coefficient of the two variables from the information above. What is it?

6. A teacher used a series of problems in a class that came from a variety of sources. After each set of problems, the students evaluated it in terms of usefulness, with 1 meaning very helpful and 5 meaning useless. The teacher wondered if the material from a prestigious school was better than the rest. He ran a regression using as the dependent variable the average student rating of the set of problems (remember, higher numbers mean less useful) and as an independent variable whether or not the problems came from the prestigious school (0 if from an ordinary school, 1 if from the prestigious school). Below are his results.

 Variable Coefficient std error t-statistic constant 2.285 .034 67.071 Prestige? .214 .057 3.780 R2 = .212 n > 40
a) What was the average rating of the lessons from the ordinary schools?
b) What was the average rating of the lessons from the prestigious school?
c) Was the expectation of the teacher confirmed?
d) Suppose the claim was that the lessons from the prestigious school were just like the other lessons and that any differences are due to random chance. Does random chance look like a good explanation of the differences in the quality of the lessons as perceived by students? What number do we use to answer this?
e) How much of the variation in student evaluations did the teacher explain with this regression? Is this a lot or a little?

(Comment: This is a problem of comparing whether or not two means are the same. Here it is done with regression. It can also be done without regression using a two-sample t-test, a test that some introductory texts explain but which I have not included on this site. The results will be the same regardless of which method is used.)

(Use of a zero-one coding is common when we have an off-on situation. Variables with this coding are called dummy variables.)

7. Below are the results from a regression trying to predict the asking price of Cadillacs based on their mileage (measured in thousands of miles). (These data were taken from an issue of the Chicago Tribune a number of years ago.)

 R Square .603 Adjusted R Square .591 . Variable Regression Coefficient Std. Error t Significance Constant 26303.415 1928.098 13.642 .000 miles -226.465 32.478 -6.973 .000

a) How successful is our attempt to explain the prices of these cars? (Hint: Use R Square.)
b) If we have a Caddy that has 10,000 miles on it, what would we predict for its price?
c) The level of significance for miles .000. What is the hypothesis being tested?
d) There is a problem with the regression. Miles and age tend to go together, with older cars having more miles. Perhaps we are capturing some of the effects of age when we include only miles. How do you think we could fix this problem?

8. For the data below, compute compute the correlation coefficient for X and Y.

 Y X -1 0 1 1 0 2 2 1 3 6
Then compute the vaues of a and b in the regression equation y = a + bX.