Solved – Confidence intervals for maximum likelihood estimator with constraints

confidence intervalconstrained regressionestimationfisher informationmaximum likelihood

Let us suppose I have a maximum likelihood estimator for a multivariate parameter $\vec{\theta}$. The parameter is subject to the following constraints:

  1. $\theta_i \in [0,1]$
  2. $\sum_i \theta_i = 1$

I want to calculate the confidence intervals for $\vec{\theta}$ and sum functions of it. Does the standard approach which assumes that $\hat{\vec{\theta}}$ is approximately normal with the covariance matrix equal to the inverse of the Fisher information matrix $I(\hat{\vec{\theta}})$ is true in the presence of the constraints? (E.g. my intuition is that $I(\hat{\vec{\theta}})$ might not be invertible in this case.) If the standard approach is not suitable, what are the alternatives? I thought that assuming a Dirichlet distribution for $\vec{\theta}$ might work, but I don't know what is the best way to fit its parameters to the MLE (moment matching?).

Best Answer

Your question is easy to answer if you are not too serious about $\theta_i\in[0,1]$. Is $\theta_i\in(0,1)$ good enough? Let's say it is. Then, instead of maximizing the likelihood function $L(\theta)$ in $\theta$, you are going to do a change of variables, and instead you maximize the likelihood function $L(\alpha)=L(\theta(\alpha))$ in $\alpha$.

What's $\theta(\alpha)$, you ask? Well, if $\theta$ is a $K$ dimensional vector, then we let $\alpha$ be a $(K-1)$ dimensional vector and set:

\begin{align} \theta_1 &= \frac{\exp(\alpha_1)}{1+\sum exp(\alpha_k)} \\ \theta_2 &= \frac{\exp(\alpha_2)}{1+\sum exp(\alpha_k)} \\ &\vdots\\ \theta_{K-1} &= \frac{\exp(\alpha_{K-1})}{1+\sum exp(\alpha_k)} \\ \theta_K &= \frac{1}{1+\sum exp(\alpha_k)} \\ \end{align}

After you substitute $\alpha$ into your likelihood function, you can maximize it unconstrained. The $\alpha$ can be any real number. The $\theta(\alpha)$ function magically imposes all your constraints on $\theta$. So, now the usual theorems proving consistency and aymptotic normality of the MLE follow.

What about $\theta$, though? Well, after you have estimated the $\alpha$, you just substitute them into the formulas above to get your estimator for $\theta$. What is the distribution of $\theta$? It is asymptotically normal with mean $\theta_0$, the true value of $\theta$, and variance $V(\hat{\theta})=\frac{\partial \theta}{\partial \alpha}' \ V(\hat{\alpha}) \frac{\partial \theta}{\partial \alpha}$.

As you say, $V(\hat{\theta})$ won't be full rank. Obviously, it can't be full rank. Why not? Because we know the variance of $\sum \hat{\theta}_i$ has to be zero---this sum is always 1, so its variance must be zero. A non-invertible variance matrix is not a problem, however, unless you are using it for some purpose it can't be used for (say to test the null hypothesis that $\sum \theta_i = 1$). If you are trying to do that, then the error message telling you that you can't divide by zero is an excellent warning that you are doing something silly.

What if you are serious about including the endpoints of your interval? Well, that's much harder. What I would suggest is that you think about whether you are really serious. For example, if the $\theta_i$ are probabilities (and that's what your constraints make me think they are), then you really should not be expecting the usual maximum likelihood procedures to give you correct standard errors.

For example, if $\theta_1$ is the probability of heads and $\theta_2$ is the probability of tails, and your dataset looks like ten heads in a row, then the maximum likelihood estimate is $\hat{\theta}_1=1$ and $\hat{\theta}_2=0$. What's the variance of the maximum likelihood estimator evaluated at this estimate? Zero.

If you want to test the null hypothesis that $\theta_1=0.5$, what do you do? You sure don't do this: "Reject null if $\left|\frac{\hat{\theta}_1-0.5}{\sqrt{\hat{V}(\hat{\theta}_1)}}\right|>1.96$" Instead, you calculate the probability that you get ten heads in a row with a fair coin. If that probability is lower than whatever significance level you picked, then you reject.

Related Question