Measuring the Center
One of the meanings of the word "statistics" is numbers,
so it should be no surprise that the study of statistics
begins with the organization of numbers.
The human brain does not deal well with masses of
numbers. However, the human brain does deal well with
pictures. A big chunk of the brain does nothing but process
images. Hence, most statistical analysis begins with a
picture. A commonly-used picture in statistics is the
histogram, a type of bar graph that shows relative
frequency.
Pictures can start us on a journey of statistical
analysis, but they do not lead very far. Using a more
abstract trail built on mathematics will takes us much
further. Although pictures are an intuitive and widely-used
first step in analyzing data, they are only a first step.
Mathematicians have found that much more analysis is
possible if we summarize the information in the data into a
few numbers.
One way of summarizing the information in a group of
numbers is to find a way of presenting what is the typical
or representative number. Three common measures of this are
the arithmetic mean or average, the
median, and the mode. The simplest of these
three, when it exists, is the mode or the most
frequently-occurring number. If we have this set of numbers:
{1, 2, 2, 3, 3, 3, 7}, the mode is 3. If we have this set of
numbers: {1, 2, 3, 4, 5} there is no mode. It is possible to
have more than one mode. If the numbers are {2, 2, 2, 3, 3,
3, 7} then both 2 and 3 are modes.
More useful than the mode is the median, or the middle
number when the numbers are arranged from lowest to highest.
In this set of seven numbers: {1, 2, 2, 3, 3, 3, 7} the
median is 3.
When there is an even number of numbers, the median is
the midpoint or average of the two numbers closest to the
middle. For example, in this set of numbers: {1, 6, 7, 2}
the median is 4 because after they are put into order 1, 2,
6, 7, the two middle numbers are 2 and 6, and their average
or midpoint is (6+2)/2 = 4.
The final common measure of the middle is the arithmetic
mean, which is also called the mean or the average. It is
found by adding up the set of numbers and dividing by N, the
number of numbers we have added together. In this set of
numbers: {1, 2, 2, 3, 3, 3, 7} the mean is 3 because they
total 21 and 21/7 = 3.
The mean is by far the most important of these three
measures for statistical purposes because it has nice
mathematical and statistical properties. The mean and median
are identical if the distribution is symmetrical. It the
distribution is lopsided with a long tail to either the left
or right, (statisticians call this type of distribution of
numbers skewed), the mean and median will not be the same.
We can see this with a simple example starting with these
numbers: {1, 2, 3, 4, 5}. The mean and median are both 3.
Now let us increase the 5 to 15, so we have these numbers:
{1, 2, 3, 4, 15}. The median is unaffected; it remains 3.
The mean however is now 25/5=5.
The last paragraph points to one of the advantages of
using the mean as the measure of the center: it is affected
by every number in the set. The median and mode, in
contrast, are not.
|
|