Solved – Standard deviation around mean rather than mode or median

mathematical-statisticsmeanmedianmodeprobability

Why is standard deviation calculated from arithmetic mean and not from other measures of central tendency? I do get that standard deviation is to calculate dispersion, but why not use mode or median when at times it might be a better measure of central tendency. That will effect chebyshev's equation (68% of values lie within +/- one standard deviation from mean), so that is the reason or something else?

Best Answer

For one, distributions more readily have a finite mean than a median or mode. Primary school analyses of these concepts for samples can obscure this issue with random variables. For the median to exist, you either require some value to have a CDF of exactly $\frac12$ (which is far from guaranteed for discrete distributions), or define how we fudge a median from values that get us as close as possible. For exactly one mode to exist, you need a unimodal distribution. For the mean to exist, you only need $\int_{\Bbb R}xdF(x)$ to be finite, with $F$ the CDF. What's more, if you want to define something analogous to $\Bbb E(X-\mu)^2$ with something replacing $\mu:=\Bbb EX$, why stop there? Do we want to define the "median-variance", for example, as the median of $(X-m)^2$, with $m$ the median of $X$? That adds further constraints, hard to fulfill and compute with, on the distributions we can work with. Of course, finite-mean distributions don't have to have finite variance, but you only need slightly lighter tails in the distribution to fix that.

Another issue is we like to analyze the relation between two variables with quantities such as the covariance, which has a wonderful geometric interpretation I've discussed on math.se. How do you modify that concept to use something other than means? Say we keep using $\Bbb E$ as a wrapper, just to avoid another difficult question. If $m_X$ is the median of $X$, do you want to define the covariance as $\Bbb EXY-m_Xm_Y$, or as $\Bbb E(X-m_X)(Y-m_Y)=\Bbb EXY-m_X\mu_Y-\mu_Xm_Y+m_Xm_Y$? These aren't equivalent! In light of the above link, my guess is you'd prefer the second definition. But it's a really strange one, since it uses the means anyway. It'll get even worse if you replace $\Bbb E$ with a median operator or whatever you prefer, since again you won't be able to see it as an inner product any more.

Finally, means & variances just arise more naturally when you do statistical theory:

  • The CLT gives an asymptotic distribution for sample means; there's nothing as neat, with such minor assumptions, for the sample median. Similarly, Student's-$t$ does something nice for $(X-\bar{X})/S$.
  • The CGF is $1+i\mu_Xt-\frac12\sigma_X^2t^2+o(t^2)$, and you can write down something similar for the characteristic function and MGF. For discrete distributions, the PGF gives similar insights into these combined moments/cumulants.

Having said that, other measures of central tendencies can have some nice properties: $\Bbb E|X-a|^p$ is minimal for $p=2$ if $a=\mu$, but for $p=1$ if $a=m$.