# Solved – Intuitive reasoning behind biased maximum likelihood estimators

biasintuitionmaximum likelihood

I have a confusion on biased maximum likelihood (ML) estimators. The mathematics of the whole concept is pretty clear to me but I cannot figure out the intuitive reasoning behind it.

Given a certain dataset which has samples from a distribution, which itself is a function of a parameter that we want to estimate, the ML estimator results in the value for the parameter which is most likely to produce the dataset.

I cannot intuitively understand a biased ML estimator in the sense that: how can the most likely value for the parameter predict the real value of the parameter with a bias towards a wrong value?

the ML estimator results in the value for the parameter which is most likely to occur in the dataset.

Given the assumptions, the ML estimator is the value of the parameter that has the best chance of producing the data set.

I cannot intuitively understand a biased ML estimator in the sense that "How can the most likely value for the parameter predict the real value of the parameter with a bias towards a wrong value?"

Bias is about expectations of sampling distributions. "Most likely to produce the data" isn't about expectations of sampling distributions. Why would they be expected to go together?

What is the basis on which it is surprising they don't necessarily correspond?

I'd suggest you consider some simple cases of MLE and ponder how the difference arises in those particular cases.

As an example, consider observations on a uniform on $$(0,\theta)$$. The largest observation is (necessarily) no bigger than the parameter, so the parameter can only take values at least as large as the largest observation.

When you consider the likelihood for $$\theta$$, it is (obviously) larger the closer $$\theta$$ is to the largest observation. So it's maximized at the largest observation; that's clearly the estimate for $$\theta$$ that maximizes the chance of obtaining the sample you got:

But on the other hand it must be biased, since the largest observation is obviously (with probability 1) smaller than the true value of $$\theta$$; any other estimate of $$\theta$$ not already ruled out by the sample itself must be larger than it, and must (quite plainly in this case) be less likely to produce the sample.

The expectation of the largest observation from a $$U(0,\theta)$$ is $$\frac{n}{n+1}\theta$$, so the usual way to unbias it is to take as the estimator of $$\theta$$: $$\hat\theta=\frac{n+1}{n}X_{(n)}$$, where $$X_{(n)}$$ is the largest observation.

This lies to the right of the MLE, and so has lower likelihood.