Problems: Correlation 2

12. Calculate the correlation coefficient from these observations:

X Y X² Y² YX

2 1

6 2

2 1

3 2

7 4

(Note: It is probably useful for a person to calculate one or two correlations by without a computer to see what is involved, but after that it is best to use a computer program. Spreadsheets will do this, and there are even some webpages on the Internet that do the calculation. See, for example< http://www.wessa.net/corr.wasp)

13. Two variables are inversely correlated but only weakly. What sort of correlation coefficient would give us this information?

14. One question of interest to teachers is how much emphasis should be given to student evaluations of teachers. In considering this question a number of years ago, a couple of professors measured the amount learned in different discussion sections of a large lecture class on a scale of one to five, with five being the highest amount learned. They then looked at the class evaluations of the teachers, also on a one to five scale, with five as the highest evaluation possible. The correlation between these two variables was -0.75. What does this result mean?

15.You should be able to tell what the correlation coefficient of these five pairs is without using a formula. What is it?

X:	1	2	3	4	5
Y:	5	4	3	2	1

Compute it with the formula in the book and see if you get the right answer.

16. How many pairs of numbers must you have in order to get a correlation coefficient that is not 1, -1, or 0?

(It is useful to calculate the correlation coefficient at least once. However, no one who is doing useful work does hand calculation --they use statistical programs. Here is a simple correlation coefficient calculator from the Internet: http://www.easycalculation.com/statistics/correlation.php.)

___ 17. The null hypothesis when computing a correlation coefficient is that there is:

a) no correlation.

b) no causation.

c) a positive correlation.

d) some correlation, either positive or negative.

___ 18. What is another term for "level of significance?"

a) t-value

b) s-coefficient.

c) p-value

d) q-score.

___ 19. A researcher has found a negative correlation between two variables with a level of significance of .92. What can we conclude?

a) There is a strong positive association between the variables.

b) There is a strong negative association between the variables.

c) There may be no meaningful association between the variables.

d) There is a strong association but we cannot tell if it is positive or negative.

___ 20. You have two variables, X and Y, and you want to predict the value of Y based on the value of X. You can do this with:

a) either correlation or regression.

b) correlation, but not with regression.

c) regression, but not with correlation.

d) neither correlation nor regression.

21. What is the correlation between X and Y? (Find the correlation coefficient.)

X Y X² Y² YX

1 10 1 100 10

3 6 9 36 18

6 4 36 16 24

7 3 49 9 21

8 2 64 4 16

25 25 159 165 89

What are 90% confidence limits for r?

22. An insurance agent wishes to examine the relationship between income and the amount of life insurance held by heads of families. He draws a random sample of ten family heads and obtains the following data:

Family Amount of Life
Insurance
($0000) omitted Income
($0000) omitted

A 9 4

B 20 8

C 22 9

D 15 8

E 17 8

F 30 12

G 18 6

H 25 10

I 10 6

J 20 9

a) The correlation coefficient for this data is .92. Explain what this number means.
b) A test statistic for the correlation coefficient is

t = (r - 0)/sqrt((1-r²)/(n-2))

where r is the correlation coefficient and n is the number of observation. (The denominator is the standard error of the correlation coefficient. This is testing the claim that the true correlation coefficient is zero.) The t-statistic has n-2 degrees of freedom. Use this formula to test the hypothesis that there is no relationship between income and amount of life insurance.
c) In addition to the Pearson correlation coefficient, there are a couple of other correlation measurements. One is the rank-order correlation coefficient. It is used when the data may be related in a non-linear manner or when the observations can be ranked but not measured. To compute it, rank both the amount of life insurance and income from lowest to highest and use these ranks to compute a correlation coefficient. How much different is this new correlation?

(Correlation to t-value calculators are available on the Internet. See, for example, http://faculty.vassar.edu/lowry/tabs.html#t ).

23. Researchers working with time series, that is, data that is taken at different points of time, worry about serial correlation. Serial correlation occurs when past values are related to present values. It can be a problem because some statistical procedures assume that each observation is independent of the others. With serial correlation there is dependency; what has happened in the past changes the probability of what will happen in the future.

A way to test for serial correlation is to lag the series to create a new series, and then compute the correlation coefficient between the two series. What is the serial correlation of this series: 0, 1, 2, 0, -1, -2, 0? Though tests of significance are dangerous with such small sample size, test the hypothesis that the true serial correlation is zero. (Let alpha = .05)

Find the correlation coefficient for these variables:

Advertising

Sales

$600

$5000

400

4000

800

7000

200

3000

500

6000

Back to Part 1

Start . Text