Maximum Likelihood – Why Use Maximum Likelihood Estimation Despite Bias

maximum likelihoodmethod of momentsnormal distribution

Maximum likelihood estimation often results into biased estimators (e.g., its estimate for the sample variance is biased for the Gaussian distribution).

What then makes it so popular? Why exactly is it used so much? Also, what in particular makes it better than the alternative approach — method of moments?

Also, I noticed that for the Gaussian, a simple scaling of the MLE estimator makes it unbiased. Why is this scaling not a standard procedure? I mean — Why is it that after MLE computation, it is not routine to find the necessary scaling to make the estimator unbiased? The standard practice seems to be the plain computation of the MLE estimates, except of course for the well known Gaussian case where the scaling factor is well known.

Best Answer

Unbiasedness isn't necessarily especially important on its own.

Aside a very limited set of circumstances, most useful estimators are biased, however they're obtained.

If two estimators have the same variance, one can readily mount an argument for preferring an unbiased one to a biased one, but that's an unusual situation to be in (that is, you may reasonably prefer unbiasedness, ceteris paribus -- but those pesky ceteris are almost never paribus).

More typically, if you want unbiasedness you'll be adding some variance to get it, and then the question would be why would you do that?

Bias is how far the expected value of my estimator will be too high on average (with negative bias indicating too low).

When I'm considering a small sample estimator, I don't really care about that. I'm usually more interested in how far wrong my estimator will be in this instance - my typical distance from right... something like a root-mean-square error or a mean absolute error would make more sense.

So if you like low variance and low bias, asking for say a minimum mean square error estimator would make sense; these are very rarely unbiased.

Bias and unbiasedness is a useful notion to be aware of, but it's not an especially useful property to seek unless you're only comparing estimators with the same variance.

ML estimators tend to be low-variance; they're usually not minimum MSE, but they often have lower MSE than than modifying them to be unbiased (when you can do it at all) would give you.

As an example, consider estimating variance when sampling from a normal distribution $\hat{\sigma}^2_\text{MMSE} = \frac{S^2}{n+1}, \hat{\sigma}^2_\text{MLE} = \frac{S^2}{n}, \hat{\sigma}^2_\text{Unb} = \frac{S^2}{n-1}$ (indeed the MMSE for the variance always has a larger denominator than $n-1$).