Solved – Why is sample standard deviation a biased estimator of $\sigma$

estimationstandard deviation

According to the Wikipedia article on unbiased estimation of standard deviation the sample SD

$$s = \sqrt{\frac{1}{n-1} \sum_{i=1}^n (x_i – \overline{x})^2}$$

is a biased estimator of the SD of the population. It states that $E(\sqrt{s^2}) \neq \sqrt{E(s^2)}$.

NB. Random variables are independent and each $x_{i} \sim N(\mu,\sigma^{2})$

My question is two-fold:

  • What is the proof of the biasedness?
  • How does one compute the expectation of the sample standard deviation

My knowledge of maths/stats is only intermediate.

Best Answer

@NRH's answer to this question gives a nice, simple proof of the biasedness of the sample standard deviation. Here I will explicitly calculate the expectation of the sample standard deviation (the original poster's second question) from a normally distributed sample, at which point the bias is clear.

The unbiased sample variance of a set of points $x_1, ..., x_n$ is

$$ s^{2} = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \overline{x})^2 $$

If the $x_i$'s are normally distributed, it is a fact that

$$ \frac{(n-1)s^2}{\sigma^2} \sim \chi^{2}_{n-1} $$

where $\sigma^2$ is the true variance. The $\chi^2_{k}$ distribution has probability density

$$ p(x) = \frac{(1/2)^{k/2}}{\Gamma(k/2)} x^{k/2 - 1}e^{-x/2} $$

using this we can derive the expected value of $s$;

$$ \begin{align} E(s) &= \sqrt{\frac{\sigma^2}{n-1}} E \left( \sqrt{\frac{s^2(n-1)}{\sigma^2}} \right) \\ &= \sqrt{\frac{\sigma^2}{n-1}} \int_{0}^{\infty} \sqrt{x} \frac{(1/2)^{(n-1)/2}}{\Gamma((n-1)/2)} x^{((n-1)/2) - 1}e^{-x/2} \ dx \end{align} $$

which follows from the definition of expected value and fact that $ \sqrt{\frac{s^2(n-1)}{\sigma^2}}$ is the square root of a $\chi^2$ distributed variable. The trick now is to rearrange terms so that the integrand becomes another $\chi^2$ density:

$$ \begin{align} E(s) &= \sqrt{\frac{\sigma^2}{n-1}} \int_{0}^{\infty} \frac{(1/2)^{(n-1)/2}}{\Gamma(\frac{n-1}{2})} x^{(n/2) - 1}e^{-x/2} \ dx \\ &= \sqrt{\frac{\sigma^2}{n-1}} \cdot \frac{ \Gamma(n/2) }{ \Gamma( \frac{n-1}{2} ) } \int_{0}^{\infty} \frac{(1/2)^{(n-1)/2}}{\Gamma(n/2)} x^{(n/2) - 1}e^{-x/2} \ dx \\ &= \sqrt{\frac{\sigma^2}{n-1}} \cdot \frac{ \Gamma(n/2) }{ \Gamma( \frac{n-1}{2} ) } \cdot \frac{ (1/2)^{(n-1)/2} }{ (1/2)^{n/2} } \underbrace{ \int_{0}^{\infty} \frac{(1/2)^{n/2}}{\Gamma(n/2)} x^{(n/2) - 1}e^{-x/2} \ dx}_{\chi^2_n \ {\rm density} } \end{align} $$

now we know the integrand the last line is equal to 1, since it is a $\chi^2_{n}$ density. Simplifying constants a bit gives

$$ E(s) = \sigma \cdot \sqrt{ \frac{2}{n-1} } \cdot \frac{ \Gamma(n/2) }{ \Gamma( \frac{n-1}{2} ) } $$

Therefore the bias of $s$ is

$$ \sigma - E(s) = \sigma \bigg(1 - \sqrt{ \frac{2}{n-1} } \cdot \frac{ \Gamma(n/2) }{ \Gamma( \frac{n-1}{2} ) } \bigg) \sim \frac{\sigma}{4 n} \>$$ as $n \to \infty$.

It's not hard to see that this bias is not 0 for any finite $n$, thus proving the sample standard deviation is biased. Below the bias is plot as a function of $n$ for $\sigma=1$ in red along with $1/4n$ in blue:

enter image description here