Variance Estimation – Methods for Estimating Variance Given the Mean

samplevariance

Assume we have a normal distribution with known mean $\mu$. How can we estimate the variance by sampling?
The typical answer to this question is to use the unbiased sample variance estimator i.e. if the data points are indicated by $x_1, \ldots x_n$ the following:
$$\frac{(x_1-\bar{x})^2+\ldots +(x_n-\bar{x})^2}{n-1}$$
Where $\bar{x}$ is the sample mean. Now can we use the actual mean in any meaningful way to get a better estimator of the variance? The first thing that comes to mind would be to replace $\bar{x}$ by $\mu$ and divide it by $n$ instead of $n-1$ (to keep it unbiased). Is this a better estimator? why?

Best Answer

Suppose you have a random sample of size $n$ from the population $\mathsf{Norm}(\mu, \sigma),$ where $\sigma$ is not known and $\mu$ is known.

Let $V = \frac 1n\sum_{i=1}^n (X_i - \mu)^2.$

Then $V$ is a better estimate of the population variance $\sigma^2$ than is $S^2=\frac{1}{n-1}\sum_{i=1}^n (X_i - \bar X)^2,$ where $\bar X =\frac 1 n \sum_{i=1}^n X_i.$

Also, a 95% CI for $\sigma^2$ tends to be narrower if we use $V$ than if we use $S^2.$ [Samples can vary, so this CI is not always narrower.]

In particular, a 95% CI for $\sigma^2$ is based on the relationship $\frac{nV}{\sigma^2} \sim \mathsf{Chisq}(\nu = n).$

Example: Suppose I have the sample x of size $n = 50$ from $\mathsf{Norm}(\mu = 20, \sigma = 3),$ where I assume $\mu$ is known and $\sigma$ is not.

set.seed(215)
x = rnorm(50, 20, 3)
summary(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  14.21   17.97   19.94   20.30   22.62   29.31 

v = (sum((x-20)^2))/50;  v
[1] 10.69335

CI.1 = 50*v/qchisq(c(.975,.025), 50);  CI.1
[1]  7.486223 16.523827
diff(CI.1)
[1] 9.037604    # width of CI

The formula for this confidence interval is $\left(\frac{50V}{U}, \frac{50V}{L}\right),$ where $L$ and $U$ cut probabilities $0.025$ from the lower and upper tails, respectively, of $\mathsf{Chisq}(\nu=50).$ For the data of my example, the CI is $(7.49\, 16.52)$ of width $9.04.$

By contrast, the 95% CI for $\sigma^2$ based on $S^2,$ where $\mu$ is estimated by $\bar X,$ uses the relationship $\frac{(n-1)S^2}{\sigma^2}\sim\mathsf{Chisq}(\nu=49).$

CI.2 = 49*var(x)/qchisq(c(.975,.025), 49);  CI.2
[1]  7.548087 16.797538
diff(CI.2)
[1] 9.249451   # wider CI

For the data of my example, the CI is $(7.55,\, 16.80)$ of width $9.25 > 9.04.$

Related Question