Lab: Correlation

Computer Exercises: Correlation

(This exercise was designed to work with SPSS. It may be modified to work with other programs, and it may have to be modified to work with the current version of SPSS.)

If we have a pair of variables that has no correlation whatsoever, and we take a sample from these variables, when we compute the sample correlation coefficient, we will rarely get zero. Instead we will get something close to zero. As the sample size gets bigger, we should usually be closer to zero than when the sample size is small. We can test this expectation using SPSS.

1. Using SPSS, construct ten columns of random numbers, each with 20 observations. Do a correlation matrix of all these numbers.

2. When you get the correlation coefficients, you get a 10x10 box, or 100 cells. The ten cells along the diagonal are meaningless--the correlation of a column with itself is 1. The other 90 actually are only 45 distinct correlations--the matrix is mirrored across the diagonal. Of these 45 separate sample correlation coefficients, how many are significant at the .05 level? How big must a sample correlation coefficient be in order to be significant at the .05 level? How many are significant at the .01 level? How big must a correlation coefficient be to be significant at the .01 level?

3. (This will take a bit of time) Enter the 45 sample correlation coefficients that you obtained and get a histogram of them. Is it centered on zero? What is the standard deviation?

4. Let's repeat the process, but this time let's make n=200. How many of the 45 sample correlation coefficients are significant at the .05 level? How big must a sample correlation coefficient be in order to be significant at the .05 level?

5. Enter these 45 sample correlation coefficients and get the histogram. Is it centered on zero? What is the standard deviation? What do you conclude about the effect of increasing the sample size?

(Below is a variant of the above exercise.)

1. Today we are going to meet multivariate analysis, the analysis of more than one variable. We will start by generating three columns of numbers. Columns 1 and 2 are to be random numbers that you now know how to generate. Since it is easy to get them, let's generate 100. Column 3 will contain the sum of columns 1 and 2. (See if you can figure out how to do this using the Transform menu. Hint: v1=RV.UNIFORM(0,10), V2=RV.UNIFORM(0,10), V3=V1+V2)

Pull down the Graph menu to "Scatter" and select simple graph. You will do three graphs: V1-V2, V1-V3, and V2-V3. Do you see any pattern in the data? Can you explain why the graphs look like they do? If you can, do it. If not, ask.

2. Graphs give impressions, but we measure whether there is pattern in the data with what is called the correlation coefficient. If there is no relationship, the correlation coefficient will be zero. If there is a direct relationship, correlation will be positive. If the relationship is inverse, the correlation will be negative. To get this measure, pull down the statistics menu to correlate. Select bivariate. Slide over all the variables. Now before you hit the OK button, think about what you expect.

Should there be a relationship (correlation) between V1 and V2? Explain.

Between V2 and V3? Between V1 and V3?

Now hit OK. How does what you expect compare to what you get?

You probably got a small correlation between V1 and V2. How do you explain why it is not zero?

3. There is a level of significance given, which means that a hypothesis (or claim) has been tested. What was the hypothesis that was being tested? What is the alternative? (Hint: should we start by assuming there is no pattern and that all is random, or that there is a pattern?)

4. What does the level of significance telling you? It is saying that if the true correlation is zero, the probability of getting sample results like the one you are getting for V1-V2 is ___________.

When you are finished, put your results on the board and we will discuss them.

5. Add another 100 observations and see how things change.

6. Instead of generating random numbers that are uniformly distributed, we can generate numbers that are normally distributed. We do this by setting:

VAR1 = RV.NORMAL(mean,stdev)

So if we want numbers with a mean of 10 and a standard deviation of 5, our equation is:

VAR1 = RV.NORMAL(10,5)

Repeat the steps above but with random numbers generated in this way instead of the uniform generation. What differences do you see?

7. You can also test the hypothesis that the mean of a column is 10 (or whatever you made the mean). If you put 10 in as the test value, should you reject the null hypothesis very often? Try it.

8. If you test the claim that the mean of a column is 12, you should usually reject that. Put 12 in as the test value and see what happens.

9. Correlation measures linear relationships. You might wonder what it does when it faces a non-linear pattern. Let us see if we can find out.

Create a series, x, of random numbers between -1 and 1. (The command in transform is

rv.unifomr(-1,1). Then create the series y = x**2. (y = x-squared.)

(You should have played with the transform menu enough by now so you can do this.)

Graph the two series with a scatter diagram (under legacy graphs or diagrams.) What do you see?

10. Now compute the correlation coefficient for x and y. What is it? Consider the claim that there is no relationship between these two variables. Does it appear that whatever correlation you found is random noise, or that there is something real going on? Explain.

1. Are you psychic? Let's find out. Pair up with a partner. Each of you think of a number between 1 to 20. Each of you write down your number, without telling the other person what you have done. Repeat this process 26 times. You should now have a data set of 26 pairs of numbers.

Enter these numbers into SPSS. Label your columns. Pull down the Graph menu to "Scatter" and select simple graph. Do you see any pattern in the data? (This is the inter-ocular test--does anything hit you between the eyes?)

2. Graphs give impressions, but we measure whether you and your partner have a psychic link. If either of you are psychic, there should be a correlation between the two sets of numbers. To get this measure, pull down the statistics menu to correlate. Select bivariate. Enter both variables. Did you get zero?

3. If you did not get zero, does this indicate that there is some psychic ability?

4. There is a level of significance given, which means that a hypothesis has been tested. What was the hypothesis that was being tested? What is the alternative?

5. What does the level of significance telling you? It is saying that if the true correlation is zero, the probability of getting sample results like the ones you are getting is ___________.

6. Does this level of significance suggest that you or your partner is psychic? Explain why or why not.

7. If you have time, add another 26 observations (total 52) and see how things change.

Below are some for the average annual rates of money growth, inflation, and real income growth of several countries for the years 1979- 1984. Enter them into a statistical program, look at scatter diagrams, and computer correlations. Is there anything more than randomness here?

Country Money Growth Inflation Real Income Growth

United States
7.5% 6.5% 1.8%

Belgium
3.0 5.5 1.0

Italy
13.0 19.5 1.8

Japan
4.0 2.2 3.9

Finland
12.1 9.5 3.4

South Africa
30.5 14.9 2.5

Bolivia
220.3 205.9 -2.3

Ecuador
25.2 25.5 2.2

Honduras
9.2 7.1 0.7

Mexico
45.0 52.3 2.7

Peru
67.6 81.1 -0.3

Venezuela
15.3 12.9 -1.7

Cyprus
14.9 10.0 5.5

Bangladesh
18.2 11.4 3.3

Sri Lanka
16.8 18.0 5.1

India
15.6 8.8 5.5

Korea
15.8 10.7 5.8

Malaysia
9.5 4.2 6.9

Nepal
14.3 8.4 2.8

Singapore
9.2 5.3 8.6

Nigeria
14.7 9.8 -2.9

Source: Data taken from Review, The Federal Reserve Bank of St. Louis (Vol 70, No. 3 (May/June 1988), p. 15. Their data contained 62 countries. The above table shows every third country when ranked by rate of inflation.

Start . Text