Problems: Review of Univariate
Statistics
We live in an uncertain and risky world. Statistics can
sometimes help us deal with this world.
In studying chance, mathematicians developed the concept
of a probability distribution. The normal curve is one
example. A simple example, but one that is often useful, is
the model of tickets in a box. Suppose we have the following
tickets in a box: 1, 1, 4.
What is the average of the box?
What is the standard deviation of the box?
Suppose we have a box of unknown tickets and we draw out
these tickets: 1, 1, 1, 1, 4, 4. What is the average of this
sample? What is the standard deviation of this sample?
There is some magic in statistics. Although we cannot
predict what will happen on any single draw from a box of
tickets (or when taking one random sample), we can be pretty
sure about what will happen if we take many draws. This is
the result of the central limit theorem and is the basis of
the gambling and insurance industries.
If we draw 200 times from the box of tickets, 1, 1, 4, we
should get a total of _____ give or take about_____. This
second number is the standard error of the sum. We expect an
average of ______ give or take about ________. (This second
number is the standard error of the mean.)
Why do we have this tendency to end up close to the
expected sum or expected average?
Statistical inference uses this theory to turn things
around. If we draw 200 times from an unknown population and
get an average of 2.1 and a standard deviation of 1.4, would
we be suspicious of the claim that the average of the box is
in fact 2? How about 2.5? Explain. (This type of procedure
is fundamental in quality control.)
Suppose we have no claims about the unknown population
but just want to know what the average is. How would we
explain what we think the average is and how precise that
estimate is using a confidence interval if the total of the
sample of 100 is 191 and the standard deviation of the
sample is 1.4? (This type of inference is used every time
you see polls about the election that will be happening in
November.)
Now that we have had this trip down memory lane, are you
ready to move on to the much more exciting area of
multivariate statistics?
Alternative numbers
1. In studying chance, mathematicians developed the
concept of a probability distribution. The normal curve is
one example. A simple example, but one that is often useful,
is the model of tickets in a box. Suppose we have the
following tickets in a box: 1, 3, 4, 5, 7.
What is the average of the box?
What is the standard deviation of the box?
2. Suppose we have a box of unknown tickets and we draw
out these tickets: 7, 7, 5, 5, 4, 4, 3, 3, 1, 1. What is the
average of this sample? What is the standard deviation of
this sample?
There is some magic in statistics. Although we cannot
predict what will happen on any single draw from a box of
tickets (or when taking one random sample), we can be pretty
sure about what will happen if we take many draws. This is
the result of the central limit theorem and is the basis of
the gambling and insurance industries.
3. If we draw 90 times from the box of tickets,
1,3,4,5,7, we should get a total of _____ give or take
about_____. This second number is the standard error of the
sum. We expect an average of ______ give or take about
________. (This second number is the standard error of the
mean.)
Why do we have this tendency to end up close to the
expected sum or expected average?
4. Statistical inference uses this theory to turn things
around. If we draw 144 times from an unknown population and
get an average of 3.9 and a standard deviation of 2.1, would
we be suspicious of the claim that the average of the box is
in fact 4? How about 4.5? Explain. (This type of procedure
is fundamental in quality control.)
5. Suppose we have no claims about the unknown population
but just want to know what the average is. How would we
explain what we think the average is and how precise that
estimate is using a confidence interval if the total of the
sample of 100 is 419 and the standard deviation of the
sample is 1.9? (This type of inference is used every time
you see polls about the election that will be happening in
November
1. Professor Box gave his class of 20 students a
multiplechoice test with 15 questions on it. Each question
had four options, and the average score on the test was just
32% correct. Professor Box wants to know if his students
were just guessing randomly, or if they actually knew
something. His friend Professor Model says that if they put
their heads together, they should be able to figure it out
(a BoxModel solution).
Having 20 students randomly answer 15 questions, each
question with one correct option and three incorrect
options, is like taking 300 draws from the box that has
these tickets: ________________
The average of the box is______ and its standard
deviation is______
If all the students guess randomly, the expect sum is:
______ and the expected average is _________.
The giveortake value for the sum is____________
Another name for the giveortake value is
_________________
In trying to determine whether the students are doing
better than randomly guessing, we are doing a hypothesis
test. Should the null hypothesis be that they are just
guessing randomly, or that they know something?
To do the test, we need to compute a zvalue. That value
is _______
What is the observed level of significance (or pvalue)?
_________
Answers here.
Go to
Part 2
