Solved – Difference between the expectation of x bar squared and the expectation of x squared

expected valuemaximum likelihoodself-study

I am trying to understand the derivation of the expectation of the maximum likelihood (MLE) of variance, however I am confused as to what the difference is between $\bar{x}$ and $x$. Below you find the derivation up to the step that I do not understand:

What is the difference between the expectation of $x^2$ and the expectation of $\bar{x}^2$? What is it about this difference that leads to a biased estimator?

Best Answer

Let's begin with some proper notation. Suppose you have a random sample $X_1, X_2, \dots,X_n$ of size $n$ from a normal population with $\mu$ and standard deviation $\sigma.$

Estimating the population mean. Then $\hat\mu = \bar X = \frac 1n\sum_{i=1}^n X_i$ is the maximum likelihood estimator (MLE) of $\mu.$ It is an unbiased estimator because $E(\bar X) = \mu.$

Each individual observation $X_i,$ say $X_1$ to be specific, also has $E(X_1) = \mu,$ and so is unbiased. But we use $\bar X$ instead of $X_i$ because $Var(\bar X) = \sigma^2/n,$ while $Var(X_i) = \sigma^2.$ It is best to use the estimator with the smaller variance.

Estimating the population variance. The MLE of $\sigma^2$ is $\hat{\sigma^2}= \frac 1n\sum_{i=1}^n (X_i - \bar X)^2.$

One can show (by completing the square and 'collecting' the terms) that $$\sum_{i=1}^n(X_i-\bar X)^2 = \sum_{i=1}^n [X_i^2 -2\bar XX_i + \bar X^2]\\ = \sum_{i=1}^n X_i^2 -n\bar X^2,$$ so that $\hat{\sigma^2} = \frac 1n\sum_{i=1}^n X_i^2 - \bar X^2.$ However, one can show that $E(\hat{\sigma^2}) = \frac{n-1}{n}\sigma^2,$ so that $\hat{\sigma^2}$ is biased on the low side.

This is one reason that statisticians define the 'sample variance' as $S^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i-\bar X)^2$ and use $S^2$ to estimate $\sigma^2.$

In R statistical software the sample standard deviation of a vector x of random observations is found as var(x), using the formula just shown with $n-1$ in the denominator.

Here is a numerical demonstration. Suppose we take a random sample of size $n=10$ from a population distributed as $\mathsf{Norm}(\mu = 100, \sigma=15),$ so that the population variance is $\sigma^2 = 225.$ For this particular sample, I happened to get $S^2 = 223.1$ and $\hat{\sigma^2} = 200.8.$ The unbiased version $S^2$ of the MLE gives a value closer to $\sigma^2 = 225$ than the MLE itself.

set.seed(1234)
x = rnorm(10, 100, 15)
unb = var(x);  unb
[1] 223.1084
mle = sum(x^2)/n - mean(x)^2; mle
[1] 200.7975

However, variance estimates are quite variable. So if you remove the set.seed statement at the start of my code just above and run the code again you may get very different results. This could be considered a "dishonest" simulation because I picked one of several runs that I considered to be 'typical'. By contrast, it is an "honest" example: Suppose I do many runs and show an "average" result, then the average result is close to what I have shown above:

set.seed(1234)
m = 10^6;  q = a = s = numeric(m)
n = 10
for(i in 1:m) {
 x = rnorm(n, 100, 15)
 a[i] = mean(x);  q[i] = sum(x^2)
 s[i] = sd(x) }
mle = q/n - a^2;  mean(mle)
[1] 202.5567         # downward bias for MLE
unb = s^2;  mean(unb)
[1] 225.063          # about right for unbiased est

Confidence intervals for population mean and variance. When neither $\mu$ nor $\sigma^2$ is known, here are the usual forms of confidence intervals for these parameters.

The quantity $\frac{\bar X - \mu}{S/\sqrt{n}} \sim \mathsf{T}(n-1),$ Student's t distribution with $n-1$ degrees of freedom. Consequently, a 95% confidence interval for $\mu$ is of the form $\bar X \pm t^*S/\sqrt{n},$ where $\pm t^*$ cut probability $0.025 = 2.5\%$ from the upper and lower tails of $\mathsf{T}(n-1),$ respectively.

The quantity $\frac{(n-1)S^2}{\sigma^2} \sim \mathsf{Chisq}(n-1),$ a chi-squared distribution with $n-1$ degrees of freedom. Consequently, a 95% CI for $\sigma^2$ is of the form $\left(\frac{(n-1)S^2}{U},\frac{(n-1)S^2}{L}\right),$ where $L$ and $U$ cut probabilities $0.025=2.5\%$ from the lower and upper tails of $\mathsf{Chisq}(n-1),$ respectively.

For the data in my example above, the 95% CIs are $(83.6, 104,9)$ for $\mu$ [which does include 100] and ${105.6, 743.6}$ for $\sigma^2$ [which does include 225].

x = rnorm(10, 100, 15)
qt = qt(c(.025,.975), 9)
mean(x) + qt*sd(x)/sqrt(n)
[1]  83.56749 104.93779
LU = qchisq(c(.975,.025), 9)
9*var(x)/LU
[1] 105.5564 743.5874

Addendum: You have almost asked a really important question. However, there is a cleaner way to look at it. We are still assuming data are randomly sampled from a normal population.

Suppose $\mu$ is known and $\sigma^2$ is not. Then it's natural to look at $V = \frac 1n \sum (X_i-\mu)^2$ as an estimator of $\sigma^2.$ One can show $V$ is MLE and unbiased. To show unbiasedness consider $$\sum \left(\frac{X_i-\mu}{\sigma}\right)^2 =\sum Z_i^2 \sim \mathsf{Chisq}(n),$$ where sums are taken over $i = 1$ to $n,$ $Z_i \stackrel{iid}{\sim}\mathsf{Norm}(0,1),\;$ $Z_i^2 \stackrel{iid}{\sim}\mathsf{Chisq}(1)\;$ and distribution $\mathsf{Chisq}(n)$ has mean $n.$ Thus, $E\left(\frac 1n\sum (X_i - \mu)^2\right)=\sigma^2.$

Suppose both $\mu$ and $\sigma^2$ are unknown. Then it is feasible to estimate $\sigma^2$ by $S^2 = \frac{1}{n-1}\sum (X_i-\bar X)^2.$ It is not trivial to prove, but suppose you are willing to believe $\frac{(n-1)S^2}{\sigma^2} \sim \mathsf{Chisq}(n-1),$ which has mean $n-1.$ Then it is easy to see that $E(S^2) = \sigma^2.$ The arm-waving explantation for the difference between $n$ degrees of freedom and $n-1$ is that we have "lost" a degree of freedom by estimating $\mu$ by $\bar X$ because of the linear constraint $\sum(X_i - \bar X) \equiv 0.$

In the simulation with a million iterations above, let $H = \frac{(n-1)S^2}{\sigma^2} = \frac{9S^2}{15^2},$ then we get the histogram below:

enter image description here

h = 9*s^2/15^2
hdr="CHISQ(9), not CHISQ(10)"
hist(h, prob=T, br=30, col="skyblue4", main=hdr)
 curve(dchisq(x,9), add=T, lwd=2)
 curve(dchisq(x,10), add=T, lwd=3, lty="dotted", col="orange")

Note: As shown above, $(n-1)S^2 = \sum_{i=1}^n (X_i - \bar X)^2 = \sum_{i=1}^n X_i^2 - n\bar X^2,$ where the first equality is by definition and the second by algebra. However, in numerical computations with the second form, one needs to take care not to round any intermediate results in order to prevent serious errors.

set.seed(2020)
x = rnorm(10, 5, .1)
var(x)
[1] 0.01665682
Q = sum(x^2); A = mean(x)
n = length(x); C = n*A^2
Q; C; (Q - C)/(n-1)
[1] 249.1114
[1] 248.9615
[1] 0.01665682              # correct
q = round(Q); c = round(C)
q; c; (q - c)/(n-1)
[1] 249
[1] 249
[1] 0                       # incorrect due to rounding
Related Question