Answers: Review of Univariate
Statistics
Back
to Problems
We live in an uncertain and risky world. Statistics can
sometimes help us deal with this world.
In studying chance, mathematicians developed the concept
of a probability distribution. The normal curve is one
example. A simple example, but one that is often useful, is
the model of tickets in a box. Suppose we have the following
tickets in a box: 1, 1, 4.
What is the average of the box?
6/3 = 2
What is the standard deviation of the box?
squareroot((1 + 1 + 4)/3) =
squareroot(6/3) = squareroot(2)
Suppose we have a box of unknown tickets and we draw out
these tickets: 1, 1, 1, 1, 4, 4. What is the average of this
sample? What is the standard deviation of this sample?
Average is 12/6 = 2; variance
is (1 + 1 + 1 + 4 + 4)/5 = 12/5 = 2.4; standard deviation is
squareroot(2.4)
There is some magic in statistics. Although we cannot
predict what will happen on any single draw from a box of
tickets (or when taking one random sample), we can be pretty
sure about what will happen if we take many draws. This is
the result of the central limit theorem and is the basis of
the gambling and insurance industries.
If we draw 200 times from the box of tickets, 1, 1, 4, we
should get a total of _
400 _ give or
take about_ 20 .
This second number is the standard error of the sum. We
expect an average of _
2 __ give or take
about _ 0.1 ___.
(This second number is the standard error of the mean.)
Why do we have this tendency to end up close to the
expected sum or expected average?
Because there are a great many
sequences that will give us a sum close to 400 relative to
the number that give us sums that are far
away.
Statistical inference uses this theory to turn things
around. If we draw 200 times from an unknown population and
get an average of 2.1 and a standard deviation of 1.4, would
we be suspicious of the claim that the average of the box is
in fact 2? How about 2.5? Explain. (This type of procedure
is fundamental in quality control.)
The standard error of the
mean would be about .1. So 2 and 2.1 are only one standard
error apart, and it is quite likely that we would get a
result like 2.1 by random chance if the true mean is 2.
However, 2.5 is 4 standard errors away from 2.1, and it is
very unlikely that we would get a result that far away from
2.5 by random chance. Hence, we would doubt that 2.5 is the
true mean if we got a sample mean of 2.1 and a sample
standard deviation of 1.4.
Suppose we have no claims about the unknown population
but just want to know what the average is. How would we
explain what we think the average is and how precise that
estimate is using a confidence interval if the total of the
sample of 100 is 191 and the standard deviation of the
sample is 1.4? (This type of inference is used every time
you see polls about the election that will be happening in
November.)
The standard error of the
mean is .1, which is 1.4 divided by the square root of 200.
We expect to be within two standard errors of the true mean
about 95% of the time, so a 95% confidence interval would be
1.91 ± 2*(0.1) or 1.9 ± .2
Now that we have had this trip down memory lane, are you
ready to move on to the much more exciting area of
multivariate statistics?
Alternative numbers
1. In studying chance, mathematicians developed the
concept of a probability distribution. The normal curve is
one example. A simple example, but one that is often useful,
is the model of tickets in a box. Suppose we have the
following tickets in a box: 1, 3, 4, 5, 7.
What is the average of the box?
What is the standard deviation of the box?
mean: 20/5 = 4; variance is
.2*(9 + 1 +0 + 1 + 9) = 4; standard deviation is
2.
2. Suppose we have a box of unknown tickets and we draw
out these tickets: 7, 7, 5, 5, 4, 4, 3, 3, 1, 1. What is the
average of this sample? What is the standard deviation of
this sample?
mean: 40/10 = 4; variance is
40/9; standard deviation is the square root of the
variance.
There is some magic in statistics. Although we cannot
predict what will happen on any single draw from a box of
tickets (or when taking one random sample), we can be pretty
sure about what will happen if we take many draws. This is
the result of the central limit theorem and is the basis of
the gambling and insurance industries.
3. If we draw 90 times from the box of tickets,
1,3,4,5,7, we should get a total of _
360 _ give or
take about_ 20 _.
This second number is the standard error of the sum. We
expect an average of _ 4 __
give or take about __
2/9 __. (This
second number is the standard error of the mean.)
Why do we have this tendency to end up close to the
expected sum or expected average?
See response on previous
problem.
4. Statistical inference uses this theory to turn things
around. If we draw 144 times from an unknown population and
get an average of 3.9 and a standard deviation of 2.1, would
we be suspicious of the claim that the average of the box is
in fact 4? How about 4.5? Explain. (This type of procedure
is fundamental in quality control.)
We would not be suspicious if
the sample mean was 3.9. The standard error of the mean is
2.1/12, or 1.75. Or sample average is less than one standard
error away from the claim, and if the claim is true, this is
the sort of result we would expect . The small difference
may be due random chance. We would be suspicious if the
sample mean is 4.5 because this is 2.86 standard errors away
from the claim, and this would happen by random chance less
than one percent of the time.
5. Suppose we have no claims about the unknown population
but just want to know what the average is. How would we
explain what we think the average is and how precise that
estimate is using a confidence interval if the total of the
sample of 100 is 419 and the standard deviation of the
sample is 1.9?
4.19 ±
2*(.19)
1. Professor Box gave his class of 20 students a
multiplechoice test with 15 questions on it. Each question
had four options, and the average score on the test was just
32% correct. Professor Box wants to know if his students
were just guessing randomly, or if they actually knew
something. His friend Professor Model says that if they put
their heads together, they should be able to figure it out
(a BoxModel solution).
Having 20 students randomly answer 15 questions, each
question with one correct option and three incorrect
options, is like taking 300 draws from the box that has
these tickets: ___ 1, 0, 0,
0 ___
The average of the box is_
.25 __ and its
standard deviation is_ 0. 433
_ (square root of .25*.75)
If all the students guess randomly, the expect sum is: _
75 __ and the
expected average is _
.25 .
The giveortake value for the sum is___
7.5
Another name for the giveortake value is ___
standard error of the
sum
In trying to determine whether the students are doing
better than randomly guessing, we are doing a hypothesis
test. Should the null hypothesis be that they are just
guessing randomly, or that they know something?
They are guessing randomly.
We start with a claim that we are trying to show is
unreasonable.
To do the test, we need to compute a zvalue. That value
is (96  75)/7.5 = 21/7.5 =
2.8
What is the observed level of significance (or pvalue)?
.0026 is the probability of
being more than 2.8 standard errors above the expected
sum.
Back
to Problems
Go to
Part 2
