Next

Back to Problems

We live in an uncertain and risky world. Statistics can sometimes help us deal with this world.

In studying chance, mathematicians developed the concept of a probability distribution. The normal curve is one example. A simple example, but one that is often useful, is the model of tickets in a box. Suppose we have the following tickets in a box: 1, 1, 4.

What is the average of the box?

6/3 = 2

What is the standard deviation of the box?

squareroot((1 + 1 + 4)/3) = squareroot(6/3) = squareroot(2)

Suppose we have a box of unknown tickets and we draw out these tickets: 1, 1, 1, 1, 4, 4. What is the average of this sample? What is the standard deviation of this sample?

Average is 12/6 = 2; variance is (1 + 1 + 1 + 4 + 4)/5 = 12/5 = 2.4; standard deviation is squareroot(2.4)

There is some magic in statistics. Although we cannot predict what will happen on any single draw from a box of tickets (or when taking one random sample), we can be pretty sure about what will happen if we take many draws. This is the result of the central limit theorem and is the basis of the gambling and insurance industries.

If we draw 200 times from the box of tickets, 1, 1, 4, we should get a total of _ 400 _ give or take about_ 20 . This second number is the standard error of the sum. We expect an average of _ 2 __ give or take about _ 0.1 ___. (This second number is the standard error of the mean.)

Why do we have this tendency to end up close to the expected sum or expected average? Because there are a great many sequences that will give us a sum close to 400 relative to the number that give us sums that are far away.

Statistical inference uses this theory to turn things around. If we draw 200 times from an unknown population and get an average of 2.1 and a standard deviation of 1.4, would we be suspicious of the claim that the average of the box is in fact 2? How about 2.5? Explain. (This type of procedure is fundamental in quality control.)

The standard error of the mean would be about .1. So 2 and 2.1 are only one standard error apart, and it is quite likely that we would get a result like 2.1 by random chance if the true mean is 2. However, 2.5 is 4 standard errors away from 2.1, and it is very unlikely that we would get a result that far away from 2.5 by random chance. Hence, we would doubt that 2.5 is the true mean if we got a sample mean of 2.1 and a sample standard deviation of 1.4.

Suppose we have no claims about the unknown population but just want to know what the average is. How would we explain what we think the average is and how precise that estimate is using a confidence interval if the total of the sample of 100 is 191 and the standard deviation of the sample is 1.4? (This type of inference is used every time you see polls about the election that will be happening in November.)

The standard error of the mean is .1, which is 1.4 divided by the square root of 200. We expect to be within two standard errors of the true mean about 95% of the time, so a 95% confidence interval would be 1.91 ± 2*(0.1) or 1.9 ± .2

Now that we have had this trip down memory lane, are you ready to move on to the much more exciting area of multivariate statistics?

Alternative numbers

1. In studying chance, mathematicians developed the concept of a probability distribution. The normal curve is one example. A simple example, but one that is often useful, is the model of tickets in a box. Suppose we have the following tickets in a box: 1, 3, 4, 5, 7.

What is the average of the box?
What is the standard deviation of the box?

mean: 20/5 = 4; variance is .2*(9 + 1 +0 + 1 + 9) = 4; standard deviation is 2.

2. Suppose we have a box of unknown tickets and we draw out these tickets: 7, 7, 5, 5, 4, 4, 3, 3, 1, 1. What is the average of this sample? What is the standard deviation of this sample?

mean: 40/10 = 4; variance is 40/9; standard deviation is the square root of the variance.

There is some magic in statistics. Although we cannot predict what will happen on any single draw from a box of tickets (or when taking one random sample), we can be pretty sure about what will happen if we take many draws. This is the result of the central limit theorem and is the basis of the gambling and insurance industries.

3. If we draw 90 times from the box of tickets, 1,3,4,5,7, we should get a total of _ 360 _ give or take about_ 20 _. This second number is the standard error of the sum. We expect an average of _ 4 __ give or take about __ 2/9 __. (This second number is the standard error of the mean.)

Why do we have this tendency to end up close to the expected sum or expected average? See response on previous problem.

4. Statistical inference uses this theory to turn things around. If we draw 144 times from an unknown population and get an average of 3.9 and a standard deviation of 2.1, would we be suspicious of the claim that the average of the box is in fact 4? How about 4.5? Explain. (This type of procedure is fundamental in quality control.)

We would not be suspicious if the sample mean was 3.9. The standard error of the mean is 2.1/12, or 1.75. Or sample average is less than one standard error away from the claim, and if the claim is true, this is the sort of result we would expect . The small difference may be due random chance. We would be suspicious if the sample mean is 4.5 because this is 2.86 standard errors away from the claim, and this would happen by random chance less than one percent of the time.

5. Suppose we have no claims about the unknown population but just want to know what the average is. How would we explain what we think the average is and how precise that estimate is using a confidence interval if the total of the sample of 100 is 419 and the standard deviation of the sample is 1.9?

4.19 ± 2*(.19)

1. Professor Box gave his class of 20 students a multiple-choice test with 15 questions on it. Each question had four options, and the average score on the test was just 32% correct. Professor Box wants to know if his students were just guessing randomly, or if they actually knew something. His friend Professor Model says that if they put their heads together, they should be able to figure it out (a Box-Model solution).

Having 20 students randomly answer 15 questions, each question with one correct option and three incorrect options, is like taking 300 draws from the box that has these tickets: ___ 1, 0, 0, 0 ___

The average of the box is_ .25 __ and its standard deviation is_ 0. 433 _ (square root of .25*.75)

If all the students guess randomly, the expect sum is: _ 75 __ and the expected average is _ .25 .

The give-or-take value for the sum is___ 7.5

Another name for the give-or-take value is ___ standard error of the sum

In trying to determine whether the students are doing better than randomly guessing, we are doing a hypothesis test. Should the null hypothesis be that they are just guessing randomly, or that they know something?

They are guessing randomly. We start with a claim that we are trying to show is unreasonable.

To do the test, we need to compute a z-value. That value is (96 - 75)/7.5 = 21/7.5 = 2.8

What is the observed level of significance (or p-value)? .0026 is the probability of being more than 2.8 standard errors above the expected sum.

Back to Problems

Go to Part 2

 Next