Maximum Likelihood – Applying MLE on a Restricted Parameter Space

maximum likelihoodnormal distributionself-study

Suppose $X_1, \ldots, X_n$ are a random sample from a normal distribution with mean $\theta$ and variance $1$. Find the maximum likelihood estimator of $\theta$, under the restriction that $\theta \geq 0$.

I have found the MLE when there is no restriction on $\theta$.

$$L(\theta|\mathbf{x})=(2\pi)^{-n/2} \text{exp}\Big(-\frac{1}{2}\sum_{i=1}^{n}(x_i-\theta)^2 \Big)$$

Differentiating $L(\theta|\mathbf{x})$ with respect to theta, and setting it equal to zero yields $$\sum_{i=1}^{n}(x_i-\theta)=0 \implies \theta=\frac{1}{n}\sum_{i=1}^{n}x_i =\bar{x}$$
The second derivative is negative, hence the MLE of $\theta$ is $\hat{\theta}=\bar{X}$.

How would I
incorporate the restriction $\theta \geq 0$.

Best Answer

Although the OP did not respond, I am answering this to showcase the method I proposed (and indicate what statistical intuition it may contain).

First, it is important to distinguish on which entity is the constraint imposed. In a deterministic optimization setting, there is no such issue : there is no "true value", and an estimator of it. We just have to find the optimizer. But in a stochastic setting, there are conceivably two different cases:

a) "Estimate the parameter given a sample that has been generated by a population that has a non-negative mean" (i.e. $\theta \ge 0$) and
b) "Estimate the parameter under the constraint that your estimator cannot take negative values"(i.e. $\hat \theta \ge 0$).

In the first case, imposing the constraint is including prior knowledge on the unknown parameter. In the second case, the constraint can be seen as reflecting a prior belief on the unknown parameter (or some technical, or "strategic", limitation of the estimator).

The mechanics of the solution are the same, though:The objective function, (the log-likelihood augmented by the non-negativity constraint on $\theta$) is

$$\tilde L(\theta|\mathbf{x})=-\frac n2 \ln(2\pi)-\frac{1}{2}\sum_{i=1}^{n}(x_i-\theta)^2 +\xi\theta,\qquad \xi\ge 0 $$

Given concavity, f.o.c is also sufficient for a global maximum. We have

$$\frac {\partial}{\partial \theta}\tilde L(\theta|\mathbf{x})=\sum_{i=1}^{n}(x_i-\theta) +\xi = 0 \Rightarrow \hat \theta = \bar x+\frac{\xi}{n} $$

1) If the solution lies in an interior point ($\Rightarrow \hat \theta >0$), then $\xi=0$ and so the solution is $\{\hat \theta= \bar x>0,\; \xi^*=0\}$.

2) If the solution lies on the boundary ($\Rightarrow \hat \theta =0$) then we obtain the value of the multiplier at the solution $\xi^* = -n\hat x$, and so the full solution is $\{\hat \theta= 0,\; \xi^*=-n\bar x\}$. But since the multiplier must be non-negative, this necessarily implies that in this case we would have $\bar x\le 0$

(There is nothing special about setting the constraint to zero. If say the constraint was $\theta \ge -2$, then if the solution lied on the boundary, $\hat \theta = -2$, it would imply (in order for the multiplier to have a positive value), that $\bar x \le -2$).

So, if the optimizer is $0$ what are we facing here? If we are in "constraint type-a", i.e we have been told that the sample comes from a population that it has a non-negative mean, then with $\hat \theta =0$ chances are that the sample may not be representative of this population.

If we are in "constraint type-b", i.e. we had the belief that the population has a non-negative mean, with $\hat \theta =0$ this belief is questioned.

(This is essentially an alternative way to deal with prior beliefs, outside the formal bayesian approach).

Regarding the properties of the estimator, one should carefully distinguish this constrained estimation case, with the case where the true parameter lies on the boundary of the parameter space.