[Math] Why does maximum likelihood estimation for uniform distribution give maximum of data

estimationprobabilityrandom variables

I am looking at parameters estimation for the uniform distribution in the context of MLEs. Now, I know the likelihood function of the Uniform distribution $U(0,\theta)$ which is $1/\theta^n$ cannot be differentiated at $\theta$. The conclusion is that the estimate of $\theta$ is $\max(x_i)$, where $x_1,x_2,\ldots,x_n$ is the random sample. I would like a layman's explanation for this.

Best Answer

You asked for an intuitive explanation. So let's think about what happens when we take a sample $x_1, x_2, \ldots, x_n$ from a uniform $[0,\theta]$ random variable, where $\theta$ is a fixed but unknown parameter. From the data, we would like to calculate a statistic (a function of the data) that estimates the value of the parameter; moreover, this estimator $\hat \theta$ maximizes the likelihood of having observed the particular sample that we obtained.

Clearly, $\theta$ cannot be smaller than the largest observation we made (or else we could not have observed it!). But in order to "maximize the likelihood," we also note that $\hat\theta$ should not be unnecessarily large. For instance, if we observed $x_1 = 2, x_2 = 7, x_3 = 4$, we could pick $\hat\theta = 100$, but intuitively this is not a good estimate--the chance that we observed $2, 7, 4$ given that the true value of the parameter is $100$ is quite small. It is much more likely that, given the information we have, the true value of the parameter is close to $7$. In fact, we can show mathematically that the probability of observing the sample given $\theta > \max_i x_i$ is smaller than if $\theta = \max_i x_i$.

We can also characterize $\hat\theta = \max_i x_i$ in terms of sufficiency: notice that once we observe the largest value, the other values give no additional information about the true value of $\theta$. So in our example, once we saw the maximum of $7$, the observations $x_1 = 2$ and $x_3 = 4$ are irrelevant for the purposes of estimating $\theta$, because we already know that $\theta \ge 7$.

Of course, we must formalize all of this with the appropriate mathematics, but this is essentially the underlying intuition involved.