Estimating variance of population with variance of sample mean

estimationsamplingstatistics

Suppose a population has a given random variable $X$ with unknown mean $\mu$ and unknown variance $\sigma^2$. Sampling theory tells us that for a sample of size $n$, $E(\bar{X}) = \mu$ and $V(\bar{X})=\frac{\sigma^2}{n}$.

It is well known that $E(\bar{X})$ is an unbiased estimator of $\mu$. To estimate the parameter $\sigma^2$, how bad of an estimator is $n V(\bar{X})$ ?

Best Answer

Let $X_i$ be i.i.d. Then the estimator is $s^2=n\cdot Var(\overline X)=n\cdot Var\left(\frac1n\cdot\sum\limits_{i=1}^n X_i\right) =\frac1n\cdot Var\left(\sum\limits_{i=1}^n X_i\right)$

Now you can calculate the expected value of $s^2$ and you will see, that $s^2$ is a biased estimator.

$E\left(\frac1n\cdot Var\left(\sum\limits_{i=1}^n X_i\right) \right)=\frac1n\cdot E\left(Var\left(\sum\limits_{i=1}^n X_i\right) \right)$

Due the independence we get

$\frac1n\cdot E\left(\sum\limits_{i=1}^n Var \left(X_i\right) \right)$. Since the variables are identical distributed we get $\frac1n\cdot E\left(n\cdot Var \left(X_i\right) \right)=E\left(Var(X_i)\right)=\frac{1}{n}E\left(\sum\limits_{i=1}^n (X_i-\overline X )^2\right)\quad \pm \mu$

$=\frac{1}{n}E\left[\sum_{i=1}^n \left[(X_i-\mu)-(\overline X-\mu) \right]^2 \right] \quad$

multipliying out

$=\frac{1}{n}E\left[\sum_{i=1}^n \left[(X_i-\mu)^2-2(\overline X-\mu)(X_i-\mu)+(\overline X-\mu)^2 \right]\right] \quad$

writing for each summand a sigma sign

$=\frac{1}{n}E\left[\sum_{i=1}^n (X_i-\mu)^2-2(\overline X-\mu)\sum_{i=1}^n(X_i-\mu)+\sum_{i=1}^n(\overline X-\mu)^2 \right] \quad$

$=\frac{1}{n}E\left[\sum_{i=1}^n (X_i-\mu)^2-2(\overline X-\mu)\color{blue}{\sum_{i=1}^n(X_i-\mu)}+n(\overline X-\mu)^2 \right] \quad$


transforming the blue term

$\sum_{i=1}^n(X_i-\mu)=n\cdot \overline X-n\cdot \mu$

Thus $2(\overline X-\mu)\color{blue}{\sum_{i=1}^n(X_i-\mu)}=2(\overline X-\mu)\cdot (n\cdot \overline X-n\cdot \mu)=2n( \overline X- \mu)^2$


$=\frac{1}{n}E\left[\sum_{i=1}^n (X_i-\mu)^2-2n( \overline X- \mu)^2+n(\overline X-\mu)^2 \right] \quad$

$=\frac{1}{n}E\left[\sum_{i=1}^n (X_i-\mu)^2-n( \overline X- \mu)^2\right] \quad$

$=\frac{1}{n}\left[\sum_{i=1}^n E\left[(X_i-\mu)^2\right]-nE\left[( \overline X- \mu)^2\right]\right] \quad$

We know, that $E\left[(X_i-\mu)^2\right]=\sigma^2$ and $E\left[( \overline X- \mu)^2\right]=\sigma_{\overline x}^2=\frac{\sigma^2}{n}$ Thus we get

$=\frac{1}{n}\left[n \cdot \sigma ^2-n \frac{\sigma ^2}{n}\right]=\sigma^2-\frac{\sigma^2}{n}=\sigma^2\cdot \left(1-\frac1n\right)$

Thus $s^2=n\cdot Var(\overline X)$ is an biased estimator for $\sigma^2$. But it is asymptotically unbiased.