Computer Exercises: Hypothesis
Testing
(This exercise was designed for use with SPSS. It may be
modified to work with other programs, and it may have to be
modified to work with the current version and SPSS.)
Testing Random Numbers
1. For you next paper
you need to sample the books in the library. The best way to
do this is to take a true random sample, and we can do this
because each book has a number assigned to it between 1 and
138290. We will use SPSS to generate 100 random digits in
this range and they will help us get our random sample. But
since we need to go to the lab and get these numbers, we
might as well do a few other things as well and maybe learn
something in the process. (If you try to do this with no
thought, it is a waste of time. Take your time and try to
understand why you are getting the results you are
getting.)
1. Open SPSS and generate 100 random numbers uniformly
distributed between 0 and 138290. (You should remember how
to do this. If not look at your old assignments. If you
still do not know, ask.) Call this column r1, set it up so
no decimals show, and print it. (You will need this page for
taking your random sample.)
2. Now let us generate seven other columns of random
numbers between 0 and 138290. (From the data window, choose
transform, compute, and then type in a new name in the
target variable box (r2, r3, r4, etc.) When you are
finished, you should have eight samples taken from a uniform
distribution.
3. What is the mean of the distribution you took these
samples from? __________
4. Goldilocks said to herself, "I think 80,000 is too
big." Let's test the hypothesis that the true mean is 80000
and see if Goldilocks is correct. In this case, we can test
it with 8 samples. Pull down Analyze to Compare Means to One
Sample T test, click over all the random variable columns
you have created, put in 80000 in the test value box, and
hit OK. If you set the critical level of significance at
.05, in how many of the samples would you reject the
hypothesis that the true mean is 80000? Explain.
5. If we set alpha (the critical level of significance)
at .01, how many of the samples would reject the hypothesis
that the true mean is 80000.
6. Then Goldilocks said, "I think 50000 is too small."
Use all eight samples to test the hypothesis that the true
mean is 50000. If we set alpha to .05, how many of the
samples reject they claim that 50000 is the true mean? How
about if we set alpha to .01? Is Goldilocks right that 50000
is too small?
7. Finally Goldilocks said, "I think 69145 is just
right." Let us test her claim. Do any of the eight samples
you have reject it if alpha is set to .05? How about if
alpha is set to .01?
8. The standard error of the mean is estimate of how much
the sample means will vary from sample to sample. Let us see
if it looks right. Take the eight sample means that you have
and put them in a new column. (I do not know of any
automatic way to do this, but there may be one.) Compute the
descriptive statistics for this column. About what should
your mean be? Is the standard deviation that you get similar
to the standard errors of the mean that were reported in
your t-tests?
9. The next chapter will introduce something called
confidence intervals. We can compute them, and since we
probably have some time, let us do that. Go back to the
t-test and put in a test value of zero. Notice that the
resulting confidence intervals have a lower value and an
upper value. In how many of them does the true mean of 69145
fall?
10. We can compute intervals that we have more confidence
in by making them wider. Go back to the t-test and select
the options button. In the dialog box that comes up, you can
set the confidence intervals to 99%. Do so and then hit OK.
The true mean should appear in 99% of confidence intervals
that we have constructed in this way, so there should not be
more than one or two confidence intervals of the entire
class that do not have the true mean. Does the true mean
fall in all of your confidence intervals?
11. If you have time, redo your columns of random numbers
so they have 400 values instead of 100. What do you expect
will happen if we redo all the results above? Try it and see
if you are right.
More Random-Number Fun
1. Suppose we have a box that has a trillion marbles. One
third of them are red. If we draw out 200 marbles, how many
red marbles should we expect to get? ________
2. The standard deviation of the box in this case is
about .471. (How did I get that? ______________) If we
compute the standard error of the count (or sum), it is
6.67. If we compute the standard error of the percent, it is
about 3.33%. Hence we would expect that 95% of the time the
sum should be in the range __________ to ___________.
3. We expect that 95% of the time, the percent should be
in the range _________ to ___________.
4. Now it is time to let the computer simulate drawing
200 marbles from this box. We will take ten draws each.
There are a couple ways of doing this, but let us use the
easiest one. Go to Compute under the Transform menu. You
need the equation v1=trunc(uniform(1.5)). Why does this
equation generate a series similar to what we would get
drawing red marbles out of the box described above?
5. Once you are doing it correctly, it is easy to add
additional columns to see what happens if we draw many
different samples of 200 from the box. All you have to do is
change the v1 to v2, v3, etc. Construct 20 columns of these
numbers. Then look at the descriptive statistics. Pull down
Analyze-->>Descriptive Statistics -- Descriptives.
Click options and ask for sums. Click all the v1 to v20
variables over and then click OK. Are your sums close to
what you expected to get up in part 1? What is the biggest
chance error?
6. Are your standard deviations close to .471? What is
the biggest error?
7. We know what was in the box from which these samples
were drawn. But suppose we did not. How close could we
estimate the average of the box? We can see by creating
confidence intervals. To do this, pull down
Analyze-->>Compare Means-->One Sample T test. Click
over the v1 to v2 variables. Click options and you should
see that the confidence interval is set to 95%. Put 0 in the
test value box. And then click OK.
In the output you get, how close are the standard errors
of the mean to the value that was given in part 2 above? Why
are they not all the same?
8. About 1 in 20 confidence intervals that we have
created should not contain the true mean of .3333. (Why one
in twenty? ______________________) Do you have any? If so,
what is it? If not, what is the one that is closest to not
containing .3333?
9. We can also create 99% confidence intervals. Redo the
steps in part 6, but when you click options, put in 99
instead of the 95 that is there. What happens to your
confidence intervals?
10. The primary purpose of the One Sample T-test is not
to construct confidence intervals, but to test claims. I
claim that the true mean is .333. Let us test that. Go back
and redo part 6, but put .333 in the test box. The t-values
you get in the output are like z-scores. Do you have any
over 2? If you do, the confidence interval that you
constructed above probably does not have .3333 in it. Does
it? (Or if all your confidence intervals had .333 in them,
the t-values you get should all be less than 2. Are they?)
We will explain this more later.
11. Suppose we did not know anything about the box but
believed that 50% of the marbles in it were red. Redo part
10 above, but with .5 as the test value. You should get
t-values that are negative and some quite a lot less than
-2. What is that telling you about the probability of
getting this sample if the true mean of the box was .5?
Part I Death in the City
1. How old do you think people are on average when they
die _______? (Fill in a number before you do anything
else.)
2. A few years ago I took 30 people who died had obits
written about them in the Sunday Chicago Tribune.
Here were there ages:
- 89, 90, 66, 39, 55,
- 81, 101, 79, 44, 65,
- 67, 96, 72, 69, 100,
- 71, 72, 83, 63, 84,
- 91, 81, 77, 82, 72,
- 83, 86, 86, 36, 71
Enter these numbers in SPSS and give the column a name.
Then compute the descriptive statistics. You should get a
mean and a standard deviation. What are they?
3. Assume that this is a valid random sample (In fact, it
is not.) We can test your claim in part one by
computing:
(Sample Average minus Claim) divided by (standard
deviation divided by square root of 30).
What do you get? (Hint: the claim is the number you
entered in question 1.)
4. Let us see what the computer gets. Pull down
Statistics to "Compare Means" to "One Sample T Test." Enter
the number from the top of this sheet (your guess as to the
average age of death) in the test Value. Slide over the
variable for these ages. Then hit OK. Do you get the same
results? Is your guess credible or not? Explain.
Part II Death in the Country:
1. Although we have not met confidence intervals in the
text, we have seen them in the lab. Here
is a sample of ages from the obituaries that have appeared
in issues of small-town newspaper from a few years ago. Why
might this not be a good way to take a sample? Can you think
of any possible biases?
2. I intentionally put a typo in these data. Do a
histogram and see if you can spot it. The correct number
should be 78.
3. Let us suppose that this is a random sample of ages of
death for the rural Midwest. Construct 53, 90, 95, and 99
percent confidence intervals for the true average age of
death. (To do this, pull down Analyze to Compare Means,
Select 1-sample t-test, then in options select the level of
confidence you want. You will have to do this procedure four
times to get the answers. There may be other ways to do this
in SPSS, but I do not know them. If you can find another
way, I would be happy to hear about it. No one would ever
use a 53% confidence interval, but we will do one because we
can.)
4. If we wanted a 95% confidence interval that was plus
or minus two years, how big would our sample have to be.
(You have to do some of the computation by hand.)
We LOVE Statistics 2008
(The following is an exercise worked with SPSS. It is
presented as an example that may help other instructors come
up with ways to use a computer statistical program to help
students understand statistics.)
Up to now we have worked a number of lab problems in
probability. We understand a process (always done as pulling
tickets from a box) and we predict what should happen when
we draw various numbers of tickets from the box. Today we
will turn the problem around: What can we say about an
unknown process from simply seeing the results of that
process?
I am making a claim that the average number of hearts in
the little bags of Necco candies is 15. We want to test that
claim. We will take a sample from some bags that I recently
purchased.
1. To take a true random sample, every bag of Necco
candies should have an equal chance of being selected. Is
the way we are sampling a true random sample? What, if
anything, is wrong with it?
2. Is there any feasible way for us to actually get a
true random sample of these candies?
We are going to ignore any problems that you identified
in the previous answers and proceed as if we actually did
have a random sample. After all, we are not doing this for
any real serious purpose. Rather we are just playing with
numbers and a computer program to illustrate results in the
book.
Count the number of hearts in the packets. (After we have
recorded your results, you may eat them, but make sure we
have the numbers before you eat them.)
(Data the class found: 12, 13, 15, 12, 15, 14, 13, 15, 14
13 14 15 14 15 13 14 15 15 14 13 14 12 14 14 13 14 14 15 14
13 13 14 14 13 12 13.5 13 13 15 14 13 12 15 15.5 14 13 14 16
12 14 13 12 16 13)
Open the file hearts2008.txt
and enter the numbers in the column titled "hearts2008."
3. It is always nice to see a histogram. Do you remember
how to have SPSS show you one? If you remember, do it. If
you do not remember, ask. What is the mean and standard
deviation of the sample?
4. Do you remember how to test a hypothesis? If you do,
test the hypothesis that the mean number of candy hearts is
15. If you do not remember, ask. Do you find the claim
plausible or not? Explain.
*********************
I have given this exercise in the past and have saved the
results. I would like to use them to introduce another part
of statistical inference: confidence intervals.
For example, a few years ago I had students take a sample
of 40 bags of valentine hearts and count the number of
hearts in the bags. The average number of candies in each
bag was 15.5. What can we say about the true average of all
bags of heart candies?
Remember, to predict what the sample sum (or average)
would be, we needed to know the standard deviation and we do
not know that. However, we can compute the standard
deviation of the sample of 40 and use this as an estimate of
the true standard deviation. The standard deviation of the
sample was 3.52. An estimate of the standard error of the
sum would be 3.52 times the square root of 40. An estimate
of the standard error of the mean would be 3.52 divided by
the square root of 40, or .56.
5. We can say that approximately 95% of the time the
sample mean would be within two standard errors of the true
mean, or the true mean plus or minus about 1.12. But we do
not know the true mean, only the sample mean. However, the
statement is still true, and only needs to be looked at from
the other end. About 95% of the time the true mean will be
within 1.12 of the sample mean. We have a sample mean. Let
us take two standard errors either side, and call the result
a 95% confidence interval. That implies that if we do this
procedure many times, 95% of the time we will construct an
interval that will contain the true mean. What is our
interval in this case?
6. Let us have the computer calculate the confidence
intervals. To do this, pull down Analyze to Compare Means
and slide over to One-Sample T-Test. Move over the first
four columns, the ones that are called hearts or love. Make
sure the test value is set to 0. Click OK.
Which confidence interval did we compute in the previous
question?
7. How many of the confidence intervals include 15?
8. See if you can make 99% confidence intervals. (Hint:
you need to use the options button.)
9. How does the confidence interval for hearts2008
compare with the confidence intervals in past years? Does it
appear that they have changed the size of the packets or
not?
10. The first time I did this, the standard deviation of
the packets was very high and then declined. What does a
high standard deviation mean? What might have happened to
make it decline in the next years?
11. In 2006 the class sampled packets of M&Ms. What
seems to be the average number of M&Ms in a packet? Is
the confidence interval wider or narrower than the
confidence interval for HEARTS?
An earlier and alternative version of the candy
exercise:
Measuring Candy 2006
1. In class last time we took a sample to determine how
many M&Ms were in the little bags of Halloween M&Ms.
The authors of our textbook would certainly tell us that the
sample we took was not a true random sample. Why was it not
a true random sample?
2. Should we use this sample to estimate what the entire
population of Halloween M&Ms packages were like?
Explain.
3. Is there any feasible way for us to actually get a
true random sample of these candies?
4. We are going to ignore the problems that are clearly
evident in the previous answer and proceed as if we actually
did have a random sample. After all, we are not doing this
for any real serious purpose. Rather we are just playing
with numbers and a computer program to illustrate results in
the book.
I e-mailed you an SPSS file with several columns of
numbers.
The first column, labeled mm_06, contains the data we
collected in class. Run one-sample t-test with this column
to get 95% and 99% confidence intervals. (Hint: To get a 99%
confidence interval, select options in the dialog box for
t-tests, and change the 95% to 99%.)
Look at the results in the one-sample statistics table.
What is the average of the sample? How was it obtained?
5. What is the std. deviation? How was it obtained?
6. What is the give-or-take value? How was it
obtained?
7. If we multiply the give-or-take value by 2 and add and
subtract that result from the mean, what do we get? How
close is this to the confidence interval that the computer
calculated?
8. If we took another sample of 45 bags of candies, would
we get the same results? Explain.
9. In past years I have had other classes conduct similar
samples. The columns named "heart_06", "love" and "hearts"
are samples from three other years using the NECCO heart
candy sold at Valentine's Day. Have SPSS construct
histograms of these three columns. You should notice
something—there is a rather striking result. This
result may be due to bad samples, or it may indicate
something has happened at the factory. What do these results
suggest might have happened?
10. The next topic we will look at is hypothesis testing.
What this procedure does is to begin with an initial claim,
then take a sample. On the basis of the sample we either
decide that the claim looks reasonable, or that it does not
look reasonable. For example, suppose the initial claim was
that an average bag of hearts had 14 pieces of candy. Given
the samples of NECCO hearts, would that claim look plausible
for any of the three years? Explain using the confidence
intervals.
11. The normal way of doing this is to do what is called
a t-test. Pull down Analyze to Compare Means to One Sample t
test. Put 14 in the test value box. Make sure that
"heart_06", "love", and "hearts" are in the test variable
box. Then click OK.
What the t-test does is computes the number of
standard-error units the test value is away from the sample
value. To compute it, we take the sample mean and subtract
the test value, and then divide by the standard error of the
mean (or what the book often calls the give-or-take number.)
If the size of the t-value is greater than 2 (or less than
-2), the claim is shaky. If it is beyond 3, it is highly
dubious. What do you get? Based on the data we have, does 14
look plausible for any of the years?
12. Redo the t-test using 15 as the test value, and then
16. What happens?
Notice the column in the output called Sig. This is
statistical significance, a name that causes endless
confusion. If I had had a chance to name it, I would have
called it the randomness indicator. It gives the probability
of getting the sample results you have if the claim is true.
So if Sig is big (say .30 or 30%), there would be 30% chance
of getting the sample mean that far from the claim if the
claim were true. Since this is a pretty big chance, the
claim is plausible. If Sig is .002, there is a 2 in 1000
chance of getting this result by random chance, and the
claim looks dubious.
13. I initially thought that the average number of
M&Ms per bag would be twelve. Test this claim with a
t-test (put 12 in the test value box). Would it look
plausible if this were a valid random sample?
14. You should not have to do a test for this: Is it
plausible that 50% of the M&Ms are brown in these bags?
(Hint—find the total orange and brown.) Why is no test
necessary?
15. One semester (probably in the fall) the class sampled
packets of M&Ms. I have included those results as well
(They are the column mm_total.) What seems to be the average
number of M&Ms in a packet? Is the confidence interval
wider or narrower than the confidence interval for the NECCO
candies? How do you explain this?
|