Solved – Jensen inequality and bias of sample standard deviation

biasestimatorsprobability-inequalitiesstandard deviation

I am currently studying Introduction to Probability, second edition, by Blitzstein and Hwang. In studying the Jensen inequality, the following example is presented:

Example 10.1.6 (Bias of sample standard deviation). Let $X_1, \dots, X_n$ be i.i.d. random variables with variance $\sigma^2$. Recall from Theorem 6.3.4 that the sample variance $S^2_n$ is unbiased for estimating $\sigma^2$. That is, $E(S^2_n) = \sigma^2$. However, we are often more interested in estimating the standard deviation $\sigma$. A natural estimator for $\sigma$ is the sample standard deviation, $S_n$. Jensen’s inequality shows us that $S_n$ is biased for estimating $\sigma$. Moreover, it tells us which way the inequality goes:

$$E(S_n) = E(\sqrt{S^2_n}) \le \sqrt{E(S^2_n)} = \sigma,$$

so the sample standard deviation tends to underestimate the true standard deviation. How biased it is depends on the distribution (so there is no universal way to fix the bias, in contrast to the fact that defining sample variance with $n − 1$ in the denominator makes it unbiased for all distributions). Fortunately, the bias is typically minor if the sample size is reasonably large. $\square$

I'm wondering if there is a (perhaps, universal) way to measure (the amount of) this bias? But, thinking about this further, I guess that, if we can measure exactly the amount of this bias, then we would be able to exactly account for it (and therefore fix it), right?

I would appreciate it if people would please take the time to elaborate on this.

Best Answer

The text you quote from Blitzstein and Hwang already tells you that there is no universal way to estimate the bias:

How biased it is depends on the distribution (so there is no universal way to fix the bias

In the canonical case that the $X_i$ are normally distributed, there is an analytic formula for the bias in terms of the gamma function. For most other distributions, the bias has to be computed numerically.

In practice, statisticians almost never correct $S_n$ for bias, and not because we can't but because there is no advantage in doing so. Amongst other considerations, fixing the bias for $S_n$ would generally increase the mean-square-error, as the reduction in bias is offset by an increase in variance.

Unbiasedness is an unambiguously desirable property only for estimators with symmetric distributions. For estimators with skewed distributions, unbiasedness is more a mathematical curiousity than an essential property. For estimators with right-skew distributions (like $S_n$) the mean-square-error can often be reduced by introducing a slight negative bias.

Proof that fixing the bias increases the MSE

I don't know any references that discuss mean-square-error (MSE) for the standard deviation, but the fact that the mean-corrected estimator has a higher MSE is easy to prove from first principles.

Suppose that $$E(S_n)=(1-\delta)\sigma$$ with $0<\delta<1$. We can conclude that \begin{eqnarray} {\rm var}(S_n)&=&E(S_n^2)-E(S_n)^2\\ &=&\sigma^2-(1-\delta)^2\sigma^2\\ &=&(2\delta-\delta^2)\sigma^2 \end{eqnarray}

The MSE of $S_n$ is \begin{eqnarray} {\rm MSE}(S_n)&=&E\left([S_n-\sigma]^2\right)\\ &=&{\rm var}(S_n)+{\rm bias}(S_n)^2\\ &=&(2\delta-\delta^2)\sigma^2 + (\delta\sigma)^2\\ &=&2\delta\sigma^2 \end{eqnarray}

The mean-corrected estimator is $$\tilde S_n=S_n/(1-\delta).$$ The MSE of $\tilde S_n$ is \begin{eqnarray} {\rm MSE}(\tilde S_n)&=&{\rm var}(\tilde S_n)+{\rm bias}(\tilde S_n)^2\\ &=&{\rm var}(S_n)/(1-\delta)^2+0^2\\ &=&(2\delta-\delta^2)\sigma^2/(1-\delta)^2\\ &=&(2\delta-\delta^2)\sigma^2\{1+2\delta +O(\delta^3)\}\\ &=&\{2\delta+3\delta^2+O(\delta^3)\}\sigma^2\\ &=&{\rm MSE}(S_n) + \{3\delta^2+O(\delta^3)\}\sigma^2 \end{eqnarray} which is greater than MSE$(S_n)$.

Related Question