Computer Exercises: Correlation
(This exercise was designed to work with SPSS. It may be
modified to work with other programs, and it may have to be
modified to work with the current version of SPSS.)
If we have a pair of variables that has no correlation
whatsoever, and we take a sample from these variables, when
we compute the sample correlation coefficient, we will
rarely get zero. Instead we will get something close to
zero. As the sample size gets bigger, we should usually be
closer to zero than when the sample size is small. We can
test this expectation using SPSS.
1. Using SPSS, construct ten columns of random numbers,
each with 20 observations. Do a correlation matrix of all
these numbers.
2. When you get the correlation coefficients, you get a
10x10 box, or 100 cells. The ten cells along the diagonal
are meaningless--the correlation of a column with itself is
1. The other 90 actually are only 45 distinct
correlations--the matrix is mirrored across the diagonal. Of
these 45 separate sample correlation coefficients, how many
are significant at the .05 level? How big must a sample
correlation coefficient be in order to be significant at the
.05 level? How many are significant at the .01 level? How
big must a correlation coefficient be to be significant at
the .01 level?
3. (This will take a bit of time) Enter the 45 sample
correlation coefficients that you obtained and get a
histogram of them. Is it centered on zero? What is the
standard deviation?
4. Let's repeat the process, but this time let's make
n=200. How many of the 45 sample correlation coefficients
are significant at the .05 level? How big must a sample
correlation coefficient be in order to be significant at the
.05 level?
5. Enter these 45 sample correlation coefficients and get
the histogram. Is it centered on zero? What is the standard
deviation? What do you conclude about the effect of
increasing the sample size?
(Below is a variant of the above exercise.)
1. Today we are going to meet multivariate analysis, the
analysis of more than one variable. We will start by
generating three columns of numbers. Columns 1 and 2 are to
be random numbers that you now know how to generate. Since
it is easy to get them, let's generate 100. Column 3 will
contain the sum of columns 1 and 2. (See if you can figure
out how to do this using the Transform menu. Hint:
v1=RV.UNIFORM(0,10), V2=RV.UNIFORM(0,10), V3=V1+V2)
Pull down the Graph menu to "Scatter" and select simple
graph. You will do three graphs: V1-V2, V1-V3, and V2-V3. Do
you see any pattern in the data? Can you explain why the
graphs look like they do? If you can, do it. If not,
ask.
2. Graphs give impressions, but we measure whether there
is pattern in the data with what is called the correlation
coefficient. If there is no relationship, the correlation
coefficient will be zero. If there is a direct relationship,
correlation will be positive. If the relationship is
inverse, the correlation will be negative. To get this
measure, pull down the statistics menu to correlate. Select
bivariate. Slide over all the variables. Now before you hit
the OK button, think about what you expect.
Should there be a relationship (correlation) between V1
and V2? Explain.
Between V2 and V3? Between V1 and V3?
Now hit OK. How does what you expect compare to what you
get?
You probably got a small correlation between V1 and V2.
How do you explain why it is not zero?
3. There is a level of significance given, which means
that a hypothesis (or claim) has been tested. What was the
hypothesis that was being tested? What is the alternative?
(Hint: should we start by assuming there is no pattern and
that all is random, or that there is a pattern?)
4. What does the level of significance telling you? It is
saying that if the true correlation is zero, the probability
of getting sample results like the one you are getting for
V1-V2 is ___________.
When you are finished, put your results on the board and
we will discuss them.
5. Add another 100 observations and see how things
change.
6. Instead of generating random numbers that are
uniformly distributed, we can generate numbers that are
normally distributed. We do this by setting:
VAR1 = RV.NORMAL(mean,stdev)
So if we want numbers with a mean of 10 and a standard
deviation of 5, our equation is:
VAR1 = RV.NORMAL(10,5)
Repeat the steps above but with random numbers generated
in this way instead of the uniform generation. What
differences do you see?
7. You can also test the hypothesis that the mean of a
column is 10 (or whatever you made the mean). If you put 10
in as the test value, should you reject the null hypothesis
very often? Try it.
8. If you test the claim that the mean of a column is 12,
you should usually reject that. Put 12 in as the test value
and see what happens.
9. Correlation measures linear relationships. You might
wonder what it does when it faces a non-linear pattern. Let
us see if we can find out.
Create a series, x, of random numbers between -1 and 1.
(The command in transform is
rv.unifomr(-1,1). Then create the series y = x**2. (y =
x-squared.)
(You should have played with the transform menu enough by
now so you can do this.)
Graph the two series with a scatter diagram (under legacy
graphs or diagrams.) What do you see?
10. Now compute the correlation coefficient for x and y.
What is it? Consider the claim that there is no relationship
between these two variables. Does it appear that whatever
correlation you found is random noise, or that there is
something real going on? Explain.
1. Are you psychic? Let's find out. Pair up with a partner.
Each of you think of a number between 1 to 20. Each of you
write down your number, without telling the other person
what you have done. Repeat this process 26 times. You should
now have a data set of 26 pairs of numbers.
Enter these numbers into SPSS. Label your columns. Pull
down the Graph menu to "Scatter" and select simple graph. Do
you see any pattern in the data? (This is the inter-ocular
test--does anything hit you between the eyes?)
2. Graphs give impressions, but we measure whether you
and your partner have a psychic link. If either of you are
psychic, there should be a correlation between the two sets
of numbers. To get this measure, pull down the statistics
menu to correlate. Select bivariate. Enter both variables.
Did you get zero?
3. If you did not get zero, does this indicate that there
is some psychic ability?
4. There is a level of significance given, which means
that a hypothesis has been tested. What was the hypothesis
that was being tested? What is the alternative?
5. What does the level of significance telling you? It is
saying that if the true correlation is zero, the probability
of getting sample results like the ones you are getting is
___________.
6. Does this level of significance suggest that you or
your partner is psychic? Explain why or why not.
7. If you have time, add another 26 observations (total
52) and see how things change.
Below are some for the average annual rates of money growth,
inflation, and real income growth of several countries for
the years 1979- 1984. Enter them into a statistical program,
look at scatter diagrams, and computer correlations. Is
there anything more than randomness here?
Country
|
Money Growth
|
Inflation
|
Real Income Growth
|
United States
|
7.5%
|
6.5%
|
1.8%
|
Belgium
|
3.0
|
5.5
|
1.0
|
Italy
|
13.0
|
19.5
|
1.8
|
Japan
|
4.0
|
2.2
|
3.9
|
Finland
|
12.1
|
9.5
|
3.4
|
South Africa
|
30.5
|
14.9
|
2.5
|
Bolivia
|
220.3
|
205.9
|
-2.3
|
Ecuador
|
25.2
|
25.5
|
2.2
|
Honduras
|
9.2
|
7.1
|
0.7
|
Mexico
|
45.0
|
52.3
|
2.7
|
Peru
|
67.6
|
81.1
|
-0.3
|
Venezuela
|
15.3
|
12.9
|
-1.7
|
Cyprus
|
14.9
|
10.0
|
5.5
|
Bangladesh
|
18.2
|
11.4
|
3.3
|
Sri Lanka
|
16.8
|
18.0
|
5.1
|
India
|
15.6
|
8.8
|
5.5
|
Korea
|
15.8
|
10.7
|
5.8
|
Malaysia
|
9.5
|
4.2
|
6.9
|
Nepal
|
14.3
|
8.4
|
2.8
|
Singapore
|
9.2
|
5.3
|
8.6
|
Nigeria
|
14.7
|
9.8
|
-2.9
|
Source: Data taken from Review, The
Federal Reserve Bank of St. Louis (Vol 70, No. 3 (May/June
1988), p. 15. Their data contained 62 countries. The above
table shows every third country when ranked by rate of
inflation.
|