[Math] The definition of the sample standard deviation

statistics

I am reviewing the statistic.
I read the book "Probability and Statistical Inference"
which is written by Robert V. Hogg and Ellit A. Tanis.


There is a statement in the book says that:
(Section: The Mean, Variance, and Standard Deviation)
The sample standard deviation,
$$ s = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_i-\bar{x})^2}\geq 0,$$
is a measure of how dispersed the data are from the sample mean.
At this stage of your study of statistics,
it is difficult to get a good understanding or meaning of the standard deviation $s$,
but you can roughly think of it as the average distance of the values $x_1, x_2, \ldots, x_n$
from the mean $\bar{x}$.
This is not true exactly,
for, in general,
$$ s > \frac{1}{n} \sum_{i=1}^{n} |x_i-\bar{x}|, $$
but it is fair to say that $s$ is somewhat larger,
yet of the same magnitude,
as the average of the distances of $x_1, x_2, …, x_n$ from $\bar{x}$.


Question 1:
What book could provide a "good understanding or meaning" of the standard deviation for me.

Question 2:
Why don't we define the sample standard deviation as the
average distance of the values $x_1, x_2, \ldots, x_n$
from the mean $\bar{x}$,
that is,
$$ s:=\frac{1}{n} \sum_{i=1}^{n} |x_i-\bar{x}|. $$

Best Answer

Question 2 is answered in this Wikipedia article about Bessel's correction.

The meaning of the standard deviation is largely the same as the meaning of the mean absolute deviation:

  • Both are translation invariant, i.e. if you add the same number to every data point, you don't change the measure of dispersion, whether it is the standard deviation or the mean absolute deviation; and
  • Both are equivariant under multiplication by non-negative numbers, i.e. if you multiply every data point by the same non-negative number, then you multiply the measure of dispersion by that. (And if the number you multiply by may be negative, then the measure of dispersion is multiplied by the absolute value of that number.)

However, the standard deviation enjoys one great advantage over the mean absolute deviation: the variance (the square of the standard deviation) of the sum of independent random variables is the sum of their variances. For example, suppose you toss a coin $1800$ times. What is the variance of the probability distribution of the number of heads? You can find it easily, whereas you can't do the same with the mean absolute deviation. That makes it possible to find the probability that the number of heads is between 895 and 912, by using the central limit theorem.

A subtler advantage is also enjoyed by the standard deviation. Suppose the population standard deviation is $\sigma$, and you can't observe that but must estimate it based on a sample. You can multiply the sample standard deviation by a particular constant (I don't remember its value off hand) to get an unbiased estimate of $\sigma$, and you can do the same with the mean absolute deviation and a different constant to do the same. Which one is more accurate, in the sense of having a smaller mean square error? It is the sample standard deviation if the population is normally distributed. However, one way in which this advantage is subtler is that it may be lost of there is a slight deviation from normality. I seem to recall that with a mixture of normal distributions with different variances, with mixing weights $0.99$ and $0.01$, that advantage may be lost.

Related Question