To close this one:
The density is recognized as a Gamma distribution with shape parameter $k=3$ and unknown scale parameter $\theta$, so we have $E(X) = 3\theta$ and $\operatorname {Var}(X) = 3\theta^2$
Given an i.i.d sample, the expected value and variance of the MLE is then
$$E(\hat \theta_{MLE}) = \theta,\;\;\operatorname {Var}(\hat \theta_{MLE}) = \frac 1 {3n}\theta^2$$
For consistency, we can use then the sufficient conditions: $\lim_{n\rightarrow}E(\hat \theta_{MLE})=\theta$, holds, and $\lim_{n\rightarrow}\operatorname {Var}(\hat \theta_{MLE})=0 \Rightarrow \lim_{n\rightarrow}\frac 1 {3n}\theta^2=0$, holds too, so the MLE is consistent.
The Central Limit Theorem Holds and so
$$\sqrt n(\hat \theta_{MLE}-\theta) \rightarrow_d \mathcal N(0, \frac 13\theta^2)$$
The Fisher Information is
$$\mathcal I(\theta) = -E\left[\frac 3 {\theta^2}-2\frac X {\theta^3}\right] = -\frac 3 {\theta^2} + 2\frac {E(X)}{\theta^3} = -\frac 3 {\theta^2} + 2\frac {3\theta}{\theta^3} = \frac 3{\theta^2}$$
and so the MLE achieves the Cramer-Rao bound.
As for the question about minimum variance estimator, @cardinal's answer here
https://math.stackexchange.com/questions/28779/minimum-variance-unbiased-estimator-for-scale-parameter-of-a-certain-gamma-distr
is a complete and sufficient statistic for estimating the answer.
Christoph Hanck has not posted the details of his proposed example. I take it he means the uniform distribution on the interval $[0,\theta],$ based on an i.i.d. sample $X_1,\ldots,X_n$ of size more than $n=1.$
The mean is $\theta/2$.
The MLE of the mean is $\max\{X_1,\ldots,X_n\}/2.$
That is biased since $\Pr(\max < \theta) = 1,$ so $\operatorname{E}({\max}/2)<\theta/2.$
PS: Perhaps we should note that the best unbiased estimator of the mean $\theta/2$ is not the sample mean, but rather is $$\frac{n+1} {2n} \cdot \max\{X_1,\ldots,X_n\}.$$ The sample mean is a lousy estimator of $\theta/2$ because for some samples, the sample mean is less than $\dfrac 1 2 \max\{X_1,\ldots,X_n\},$ and it is clearly impossible for $\theta/2$ to be less than ${\max}/2.$
end of PS
I suspect the Pareto distribution is another such case. Here's the probability measure:
$$
\alpha\left( \frac \kappa x \right)^\alpha\ \frac{dx} x \text{ for } x >\kappa.
$$
The expected value is $\dfrac \alpha {\alpha -1 } \kappa.$ The MLE of the expected value is
$$
\frac n {n - \sum_{i=1}^n \big((\log X_i) - \log(\min)\big)} \cdot \min
$$
where $\min = \min\{X_1,\ldots,X_n\}.$
I haven't worked out the expected value of the MLE for the mean, so I don't know what its bias is.
Best Answer
Let's begin with some proper notation. Suppose you have a random sample $X_1, X_2, \dots,X_n$ of size $n$ from a normal population with $\mu$ and standard deviation $\sigma.$
Estimating the population mean. Then $\hat\mu = \bar X = \frac 1n\sum_{i=1}^n X_i$ is the maximum likelihood estimator (MLE) of $\mu.$ It is an unbiased estimator because $E(\bar X) = \mu.$
Each individual observation $X_i,$ say $X_1$ to be specific, also has $E(X_1) = \mu,$ and so is unbiased. But we use $\bar X$ instead of $X_i$ because $Var(\bar X) = \sigma^2/n,$ while $Var(X_i) = \sigma^2.$ It is best to use the estimator with the smaller variance.
Estimating the population variance. The MLE of $\sigma^2$ is $\hat{\sigma^2}= \frac 1n\sum_{i=1}^n (X_i - \bar X)^2.$
One can show (by completing the square and 'collecting' the terms) that $$\sum_{i=1}^n(X_i-\bar X)^2 = \sum_{i=1}^n [X_i^2 -2\bar XX_i + \bar X^2]\\ = \sum_{i=1}^n X_i^2 -n\bar X^2,$$ so that $\hat{\sigma^2} = \frac 1n\sum_{i=1}^n X_i^2 - \bar X^2.$ However, one can show that $E(\hat{\sigma^2}) = \frac{n-1}{n}\sigma^2,$ so that $\hat{\sigma^2}$ is biased on the low side.
This is one reason that statisticians define the 'sample variance' as $S^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i-\bar X)^2$ and use $S^2$ to estimate $\sigma^2.$
In R statistical software the sample standard deviation of a vector
x
of random observations is found asvar(x)
, using the formula just shown with $n-1$ in the denominator.Here is a numerical demonstration. Suppose we take a random sample of size $n=10$ from a population distributed as $\mathsf{Norm}(\mu = 100, \sigma=15),$ so that the population variance is $\sigma^2 = 225.$ For this particular sample, I happened to get $S^2 = 223.1$ and $\hat{\sigma^2} = 200.8.$ The unbiased version $S^2$ of the MLE gives a value closer to $\sigma^2 = 225$ than the MLE itself.
However, variance estimates are quite variable. So if you remove the
set.seed
statement at the start of my code just above and run the code again you may get very different results. This could be considered a "dishonest" simulation because I picked one of several runs that I considered to be 'typical'. By contrast, it is an "honest" example: Suppose I do many runs and show an "average" result, then the average result is close to what I have shown above:Confidence intervals for population mean and variance. When neither $\mu$ nor $\sigma^2$ is known, here are the usual forms of confidence intervals for these parameters.
The quantity $\frac{\bar X - \mu}{S/\sqrt{n}} \sim \mathsf{T}(n-1),$ Student's t distribution with $n-1$ degrees of freedom. Consequently, a 95% confidence interval for $\mu$ is of the form $\bar X \pm t^*S/\sqrt{n},$ where $\pm t^*$ cut probability $0.025 = 2.5\%$ from the upper and lower tails of $\mathsf{T}(n-1),$ respectively.
The quantity $\frac{(n-1)S^2}{\sigma^2} \sim \mathsf{Chisq}(n-1),$ a chi-squared distribution with $n-1$ degrees of freedom. Consequently, a 95% CI for $\sigma^2$ is of the form $\left(\frac{(n-1)S^2}{U},\frac{(n-1)S^2}{L}\right),$ where $L$ and $U$ cut probabilities $0.025=2.5\%$ from the lower and upper tails of $\mathsf{Chisq}(n-1),$ respectively.
For the data in my example above, the 95% CIs are $(83.6, 104,9)$ for $\mu$ [which does include 100] and ${105.6, 743.6}$ for $\sigma^2$ [which does include 225].
Addendum: You have almost asked a really important question. However, there is a cleaner way to look at it. We are still assuming data are randomly sampled from a normal population.
Suppose $\mu$ is known and $\sigma^2$ is not. Then it's natural to look at $V = \frac 1n \sum (X_i-\mu)^2$ as an estimator of $\sigma^2.$ One can show $V$ is MLE and unbiased. To show unbiasedness consider $$\sum \left(\frac{X_i-\mu}{\sigma}\right)^2 =\sum Z_i^2 \sim \mathsf{Chisq}(n),$$ where sums are taken over $i = 1$ to $n,$ $Z_i \stackrel{iid}{\sim}\mathsf{Norm}(0,1),\;$ $Z_i^2 \stackrel{iid}{\sim}\mathsf{Chisq}(1)\;$ and distribution $\mathsf{Chisq}(n)$ has mean $n.$ Thus, $E\left(\frac 1n\sum (X_i - \mu)^2\right)=\sigma^2.$
Suppose both $\mu$ and $\sigma^2$ are unknown. Then it is feasible to estimate $\sigma^2$ by $S^2 = \frac{1}{n-1}\sum (X_i-\bar X)^2.$ It is not trivial to prove, but suppose you are willing to believe $\frac{(n-1)S^2}{\sigma^2} \sim \mathsf{Chisq}(n-1),$ which has mean $n-1.$ Then it is easy to see that $E(S^2) = \sigma^2.$ The arm-waving explantation for the difference between $n$ degrees of freedom and $n-1$ is that we have "lost" a degree of freedom by estimating $\mu$ by $\bar X$ because of the linear constraint $\sum(X_i - \bar X) \equiv 0.$
In the simulation with a million iterations above, let $H = \frac{(n-1)S^2}{\sigma^2} = \frac{9S^2}{15^2},$ then we get the histogram below:
Note: As shown above, $(n-1)S^2 = \sum_{i=1}^n (X_i - \bar X)^2 = \sum_{i=1}^n X_i^2 - n\bar X^2,$ where the first equality is by definition and the second by algebra. However, in numerical computations with the second form, one needs to take care not to round any intermediate results in order to prevent serious errors.