Statistics – Motivation Behind Standard Deviation

intuitionstandard deviationstatistics

Let's take the numbers 0-10. Their mean is 5, and the individual deviations from 5 are
-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5
And so the average (magnitude of) deviation from the mean is $30/11 \approx 2.72$.

However, this is not the standard deviation. The standard deviation is $\sqrt{10} \approx 3.16$.

The first mean-deviation is a simpler and by far more intuitive definition of the "standard-deviation", so I'm sure it's the first definition statisticians worked with. However, for some reason they decided to adopt the second definition instead. What is the reasoning behind that decision?

Best Answer

Your guess is correct: least absolute deviations was the method tried first historically. The first to use it were astronomers who were attempting to combine observations subject to error. Boscovitch in 1755 published this method and a geometric solution. It was used later by Laplace in a 1789 work on geodesy. Laplace formulated the problem more mathematically and described an analytical solution.

Legendre appears to be the first to use least squares, doing so as early as 1798 for work in celestial mechanics. However, he supplied no probabilistic justification. A decade later, Gauss (in an 1809 treatise on celestial motion and conic sections) asserted axiomatically that the arithmetic mean was the best way to combine observations, invoked the maximum likelihood principle, and then showed that a probability distribution for which the likelihood is maximized at the mean must be proportional to $\exp(-x^2 / (2 \sigma^2))$ (now called a "Gaussian") where $\sigma$ quantifies the precision of the observations.

The likelihood (when the observations are statistically independent) is the product of these Gaussian terms which, due to the presence of the exponential, is most easily maximized by minimizing the negative of its logarithm. Up to an additive constant, the negative log of the product is the sum of the squares (all divided by a constant $2 \sigma^2$, which will not affect the minimization). Thus, even historically, the method of least squares is intimately tied up with likelihood calculations and averaging. There are plenty of other modern justifications for least squares, of course, but this derivation by Gauss--with the almost magical appearance of the Gaussian, which had first appeared some 70 years early in De Moivre's work on sums of Bernoulli variables (the Central Limit Theorem)--is memorable.

This story was researched, and is ably recounted, by Steven Stigler in his The History of Statistics - The Measurement of Uncertainty before 1900 (1986). Here I have merely given the highlights of parts of chapters 1 and 4.