Lab: Hypothesis Testing

Computer Exercises: Hypothesis Testing

(This exercise was designed for use with SPSS. It may be modified to work with other programs, and it may have to be modified to work with the current version and SPSS.)

Testing Random Numbers

1. For you next paper you need to sample the books in the library. The best way to do this is to take a true random sample, and we can do this because each book has a number assigned to it between 1 and 138290. We will use SPSS to generate 100 random digits in this range and they will help us get our random sample. But since we need to go to the lab and get these numbers, we might as well do a few other things as well and maybe learn something in the process. (If you try to do this with no thought, it is a waste of time. Take your time and try to understand why you are getting the results you are getting.)

1. Open SPSS and generate 100 random numbers uniformly distributed between 0 and 138290. (You should remember how to do this. If not look at your old assignments. If you still do not know, ask.) Call this column r1, set it up so no decimals show, and print it. (You will need this page for taking your random sample.)

2. Now let us generate seven other columns of random numbers between 0 and 138290. (From the data window, choose transform, compute, and then type in a new name in the target variable box (r2, r3, r4, etc.) When you are finished, you should have eight samples taken from a uniform distribution.

3. What is the mean of the distribution you took these samples from? __________

4. Goldilocks said to herself, "I think 80,000 is too big." Let's test the hypothesis that the true mean is 80000 and see if Goldilocks is correct. In this case, we can test it with 8 samples. Pull down Analyze to Compare Means to One Sample T test, click over all the random variable columns you have created, put in 80000 in the test value box, and hit OK. If you set the critical level of significance at .05, in how many of the samples would you reject the hypothesis that the true mean is 80000? Explain.

5. If we set alpha (the critical level of significance) at .01, how many of the samples would reject the hypothesis that the true mean is 80000.

6. Then Goldilocks said, "I think 50000 is too small." Use all eight samples to test the hypothesis that the true mean is 50000. If we set alpha to .05, how many of the samples reject they claim that 50000 is the true mean? How about if we set alpha to .01? Is Goldilocks right that 50000 is too small?

7. Finally Goldilocks said, "I think 69145 is just right." Let us test her claim. Do any of the eight samples you have reject it if alpha is set to .05? How about if alpha is set to .01?

8. The standard error of the mean is estimate of how much the sample means will vary from sample to sample. Let us see if it looks right. Take the eight sample means that you have and put them in a new column. (I do not know of any automatic way to do this, but there may be one.) Compute the descriptive statistics for this column. About what should your mean be? Is the standard deviation that you get similar to the standard errors of the mean that were reported in your t-tests?

9. The next chapter will introduce something called confidence intervals. We can compute them, and since we probably have some time, let us do that. Go back to the t-test and put in a test value of zero. Notice that the resulting confidence intervals have a lower value and an upper value. In how many of them does the true mean of 69145 fall?

10. We can compute intervals that we have more confidence in by making them wider. Go back to the t-test and select the options button. In the dialog box that comes up, you can set the confidence intervals to 99%. Do so and then hit OK. The true mean should appear in 99% of confidence intervals that we have constructed in this way, so there should not be more than one or two confidence intervals of the entire class that do not have the true mean. Does the true mean fall in all of your confidence intervals?

11. If you have time, redo your columns of random numbers so they have 400 values instead of 100. What do you expect will happen if we redo all the results above? Try it and see if you are right.

More Random-Number Fun

1. Suppose we have a box that has a trillion marbles. One third of them are red. If we draw out 200 marbles, how many red marbles should we expect to get? ________

2. The standard deviation of the box in this case is about .471. (How did I get that? ______________) If we compute the standard error of the count (or sum), it is 6.67. If we compute the standard error of the percent, it is about 3.33%. Hence we would expect that 95% of the time the sum should be in the range __________ to ___________.

3. We expect that 95% of the time, the percent should be in the range _________ to ___________.

4. Now it is time to let the computer simulate drawing 200 marbles from this box. We will take ten draws each. There are a couple ways of doing this, but let us use the easiest one. Go to Compute under the Transform menu. You need the equation v1=trunc(uniform(1.5)). Why does this equation generate a series similar to what we would get drawing red marbles out of the box described above?

5. Once you are doing it correctly, it is easy to add additional columns to see what happens if we draw many different samples of 200 from the box. All you have to do is change the v1 to v2, v3, etc. Construct 20 columns of these numbers. Then look at the descriptive statistics. Pull down Analyze-->>Descriptive Statistics -- Descriptives. Click options and ask for sums. Click all the v1 to v20 variables over and then click OK. Are your sums close to what you expected to get up in part 1? What is the biggest chance error?

6. Are your standard deviations close to .471? What is the biggest error?

7. We know what was in the box from which these samples were drawn. But suppose we did not. How close could we estimate the average of the box? We can see by creating confidence intervals. To do this, pull down Analyze-->>Compare Means-->One Sample T test. Click over the v1 to v2 variables. Click options and you should see that the confidence interval is set to 95%. Put 0 in the test value box. And then click OK.

In the output you get, how close are the standard errors of the mean to the value that was given in part 2 above? Why are they not all the same?

8. About 1 in 20 confidence intervals that we have created should not contain the true mean of .3333. (Why one in twenty? ______________________) Do you have any? If so, what is it? If not, what is the one that is closest to not containing .3333?

9. We can also create 99% confidence intervals. Redo the steps in part 6, but when you click options, put in 99 instead of the 95 that is there. What happens to your confidence intervals?

10. The primary purpose of the One Sample T-test is not to construct confidence intervals, but to test claims. I claim that the true mean is .333. Let us test that. Go back and redo part 6, but put .333 in the test box. The t-values you get in the output are like z-scores. Do you have any over 2? If you do, the confidence interval that you constructed above probably does not have .3333 in it. Does it? (Or if all your confidence intervals had .333 in them, the t-values you get should all be less than 2. Are they?) We will explain this more later.

11. Suppose we did not know anything about the box but believed that 50% of the marbles in it were red. Redo part 10 above, but with .5 as the test value. You should get t-values that are negative and some quite a lot less than -2. What is that telling you about the probability of getting this sample if the true mean of the box was .5?

Part I Death in the City

1. How old do you think people are on average when they die _______? (Fill in a number before you do anything else.)

2. A few years ago I took 30 people who died had obits written about them in the Sunday Chicago Tribune. Here were there ages:

89, 90, 66, 39, 55,

81, 101, 79, 44, 65,

67, 96, 72, 69, 100,

71, 72, 83, 63, 84,

91, 81, 77, 82, 72,

83, 86, 86, 36, 71

Enter these numbers in SPSS and give the column a name. Then compute the descriptive statistics. You should get a mean and a standard deviation. What are they?

3. Assume that this is a valid random sample (In fact, it is not.) We can test your claim in part one by computing:

(Sample Average minus Claim) divided by (standard deviation divided by square root of 30).

What do you get? (Hint: the claim is the number you entered in question 1.)

4. Let us see what the computer gets. Pull down Statistics to "Compare Means" to "One Sample T Test." Enter the number from the top of this sheet (your guess as to the average age of death) in the test Value. Slide over the variable for these ages. Then hit OK. Do you get the same results? Is your guess credible or not? Explain.

Part II Death in the Country:

1. Although we have not met confidence intervals in the text, we have seen them in the lab. Here is a sample of ages from the obituaries that have appeared in issues of small-town newspaper from a few years ago. Why might this not be a good way to take a sample? Can you think of any possible biases?

2. I intentionally put a typo in these data. Do a histogram and see if you can spot it. The correct number should be 78.

3. Let us suppose that this is a random sample of ages of death for the rural Midwest. Construct 53, 90, 95, and 99 percent confidence intervals for the true average age of death. (To do this, pull down Analyze to Compare Means, Select 1-sample t-test, then in options select the level of confidence you want. You will have to do this procedure four times to get the answers. There may be other ways to do this in SPSS, but I do not know them. If you can find another way, I would be happy to hear about it. No one would ever use a 53% confidence interval, but we will do one because we can.)

4. If we wanted a 95% confidence interval that was plus or minus two years, how big would our sample have to be. (You have to do some of the computation by hand.)

We LOVE Statistics 2008

(The following is an exercise worked with SPSS. It is presented as an example that may help other instructors come up with ways to use a computer statistical program to help students understand statistics.)

Up to now we have worked a number of lab problems in probability. We understand a process (always done as pulling tickets from a box) and we predict what should happen when we draw various numbers of tickets from the box. Today we will turn the problem around: What can we say about an unknown process from simply seeing the results of that process?

I am making a claim that the average number of hearts in the little bags of Necco candies is 15. We want to test that claim. We will take a sample from some bags that I recently purchased.

1. To take a true random sample, every bag of Necco candies should have an equal chance of being selected. Is the way we are sampling a true random sample? What, if anything, is wrong with it?

2. Is there any feasible way for us to actually get a true random sample of these candies?

We are going to ignore any problems that you identified in the previous answers and proceed as if we actually did have a random sample. After all, we are not doing this for any real serious purpose. Rather we are just playing with numbers and a computer program to illustrate results in the book.

Count the number of hearts in the packets. (After we have recorded your results, you may eat them, but make sure we have the numbers before you eat them.)

(Data the class found: 12, 13, 15, 12, 15, 14, 13, 15, 14 13 14 15 14 15 13 14 15 15 14 13 14 12 14 14 13 14 14 15 14 13 13 14 14 13 12 13.5 13 13 15 14 13 12 15 15.5 14 13 14 16 12 14 13 12 16 13)

Open the file hearts2008.txt and enter the numbers in the column titled "hearts2008."

3. It is always nice to see a histogram. Do you remember how to have SPSS show you one? If you remember, do it. If you do not remember, ask. What is the mean and standard deviation of the sample?

4. Do you remember how to test a hypothesis? If you do, test the hypothesis that the mean number of candy hearts is 15. If you do not remember, ask. Do you find the claim plausible or not? Explain.

*********************

I have given this exercise in the past and have saved the results. I would like to use them to introduce another part of statistical inference: confidence intervals.

For example, a few years ago I had students take a sample of 40 bags of valentine hearts and count the number of hearts in the bags. The average number of candies in each bag was 15.5. What can we say about the true average of all bags of heart candies?

Remember, to predict what the sample sum (or average) would be, we needed to know the standard deviation and we do not know that. However, we can compute the standard deviation of the sample of 40 and use this as an estimate of the true standard deviation. The standard deviation of the sample was 3.52. An estimate of the standard error of the sum would be 3.52 times the square root of 40. An estimate of the standard error of the mean would be 3.52 divided by the square root of 40, or .56.

5. We can say that approximately 95% of the time the sample mean would be within two standard errors of the true mean, or the true mean plus or minus about 1.12. But we do not know the true mean, only the sample mean. However, the statement is still true, and only needs to be looked at from the other end. About 95% of the time the true mean will be within 1.12 of the sample mean. We have a sample mean. Let us take two standard errors either side, and call the result a 95% confidence interval. That implies that if we do this procedure many times, 95% of the time we will construct an interval that will contain the true mean. What is our interval in this case?

6. Let us have the computer calculate the confidence intervals. To do this, pull down Analyze to Compare Means and slide over to One-Sample T-Test. Move over the first four columns, the ones that are called hearts or love. Make sure the test value is set to 0. Click OK.

Which confidence interval did we compute in the previous question?

7. How many of the confidence intervals include 15?

8. See if you can make 99% confidence intervals. (Hint: you need to use the options button.)

9. How does the confidence interval for hearts2008 compare with the confidence intervals in past years? Does it appear that they have changed the size of the packets or not?

10. The first time I did this, the standard deviation of the packets was very high and then declined. What does a high standard deviation mean? What might have happened to make it decline in the next years?

11. In 2006 the class sampled packets of M&Ms. What seems to be the average number of M&Ms in a packet? Is the confidence interval wider or narrower than the confidence interval for HEARTS?

An earlier and alternative version of the candy exercise:

Measuring Candy 2006

1. In class last time we took a sample to determine how many M&Ms were in the little bags of Halloween M&Ms. The authors of our textbook would certainly tell us that the sample we took was not a true random sample. Why was it not a true random sample?

2. Should we use this sample to estimate what the entire population of Halloween M&Ms packages were like? Explain.

3. Is there any feasible way for us to actually get a true random sample of these candies?

4. We are going to ignore the problems that are clearly evident in the previous answer and proceed as if we actually did have a random sample. After all, we are not doing this for any real serious purpose. Rather we are just playing with numbers and a computer program to illustrate results in the book.

I e-mailed you an SPSS file with several columns of numbers.

The first column, labeled mm_06, contains the data we collected in class. Run one-sample t-test with this column to get 95% and 99% confidence intervals. (Hint: To get a 99% confidence interval, select options in the dialog box for t-tests, and change the 95% to 99%.)

Look at the results in the one-sample statistics table. What is the average of the sample? How was it obtained?

5. What is the std. deviation? How was it obtained?

6. What is the give-or-take value? How was it obtained?

7. If we multiply the give-or-take value by 2 and add and subtract that result from the mean, what do we get? How close is this to the confidence interval that the computer calculated?

8. If we took another sample of 45 bags of candies, would we get the same results? Explain.

9. In past years I have had other classes conduct similar samples. The columns named "heart_06", "love" and "hearts" are samples from three other years using the NECCO heart candy sold at Valentine's Day. Have SPSS construct histograms of these three columns. You should notice something—there is a rather striking result. This result may be due to bad samples, or it may indicate something has happened at the factory. What do these results suggest might have happened?

10. The next topic we will look at is hypothesis testing. What this procedure does is to begin with an initial claim, then take a sample. On the basis of the sample we either decide that the claim looks reasonable, or that it does not look reasonable. For example, suppose the initial claim was that an average bag of hearts had 14 pieces of candy. Given the samples of NECCO hearts, would that claim look plausible for any of the three years? Explain using the confidence intervals.

11. The normal way of doing this is to do what is called a t-test. Pull down Analyze to Compare Means to One Sample t test. Put 14 in the test value box. Make sure that "heart_06", "love", and "hearts" are in the test variable box. Then click OK.

What the t-test does is computes the number of standard-error units the test value is away from the sample value. To compute it, we take the sample mean and subtract the test value, and then divide by the standard error of the mean (or what the book often calls the give-or-take number.) If the size of the t-value is greater than 2 (or less than -2), the claim is shaky. If it is beyond 3, it is highly dubious. What do you get? Based on the data we have, does 14 look plausible for any of the years?

12. Redo the t-test using 15 as the test value, and then 16. What happens?

Notice the column in the output called Sig. This is statistical significance, a name that causes endless confusion. If I had had a chance to name it, I would have called it the randomness indicator. It gives the probability of getting the sample results you have if the claim is true. So if Sig is big (say .30 or 30%), there would be 30% chance of getting the sample mean that far from the claim if the claim were true. Since this is a pretty big chance, the claim is plausible. If Sig is .002, there is a 2 in 1000 chance of getting this result by random chance, and the claim looks dubious.

13. I initially thought that the average number of M&Ms per bag would be twelve. Test this claim with a t-test (put 12 in the test value box). Would it look plausible if this were a valid random sample?

14. You should not have to do a test for this: Is it plausible that 50% of the M&Ms are brown in these bags? (Hint—find the total orange and brown.) Why is no test necessary?

15. One semester (probably in the fall) the class sampled packets of M&Ms. I have included those results as well (They are the column mm_total.) What seems to be the average number of M&Ms in a packet? Is the confidence interval wider or narrower than the confidence interval for the NECCO candies? How do you explain this?

Start . Text