













Adding the absolute deviations gives six, and dividing by five gives 1.2. The results for the other two sets of numbers are .8 and 1.2. According to this measure of dispersion, which is called the mean deviation, the first and third set of numbers are equally dispersed.
The mean deviation may seem a sensible and obvious way to measure dispersion, but another measure that is less obvious and intuitive has better mathematical properties, so statisticians rarely use the mean deviation. The better measure gets rid of the negative signs by squaring the deviations from the mean. For the first set of numbers the computation is:
x (xmean)^{2} 1 4 2 1 3 0 4 1 5 4
Adding the squared deviation gives 10 and dividing by n, the number of observations, gives 10/5=2. This measure is called the variance. Taking the square root, which puts us back into the terms of the original data, gives us the standard deviation, the most commonlyused measure of dispersion in statistics.
However, we have one more complication to toss into this story. If the numbers are a sample, that is, they are generated by some process, the division is not by n but by n1. Hence, the variance is not 2, but 2.5, and the standard deviation of these numbers is the square root of 2.5, or approximately 1.58. The variances for the other two sets of numbers are 1 and 4.5, so their standard deviations are 1 and approximately 2.12. Notice that the standard deviation gives more weight to large deviations than the mean deviations does.
Why divide by n1 rather than n? When we are dealing with a sample, we want an estimate of what the real standard deviation is, the standard deviation of the population or the process. Whether we divide by n or n1, we may be too big or too small, but on average when we divide by n we tend to get an estimate that is a bit too small, while dividing by n1 gives us an estimate that will be, on the average, correct. In other words, dividing by n gives us a biased estimator, while dividing by n1 gives us an unbiased estimator.
Here is another way of thinking about it. When we have a sample, we use up some of the information in the sample to estimate the mean of the population. So we do not have n bits of information leftwe have less than that.
