Solved – MLE: Marginal vs Full Likelihood

likelihoodmarginal-distributionmaximum likelihoodprofile-likelihood

Suppose I have a statistical model with parameters $\boldsymbol{\theta}=\{\theta_1,\theta_2,\dots,\theta_n\}$ of which only a single parameter, say $\theta_1$, is of interest to me. Suppose also that I can write down the full likelihood function for the model $\mathcal{L(\boldsymbol{\theta};\mathbf{x})}$.

Using the method of maximum likelihood I can estimate $\theta_1$ by maximising $\mathcal{L(\boldsymbol{\theta};\mathbf{x})}$ with respect to $\boldsymbol{\theta}$ to obtain $\hat{\boldsymbol{\theta}}$ and retrieving $\hat{\theta}_1 \in \hat{\boldsymbol{\theta}}$.

Alternatively I can maximise the marginal likelihood $\mathcal{L(\theta_1;\mathbf{x})} = \idotsint \mathcal{L(\boldsymbol{\theta};\mathbf{x})} \,d\theta_2 \dots d\theta_n$ which is calculated by integrating over all possible values of $\{\theta_2,\dots,\theta_n\}$.

Under what circumstances (if any) is the latter approach preferable, bearing in mind that I am only interested in the value of $\theta_1$?

Best Answer

The usual way of doing likelihood inference on a parameter of interest in the presence of nuisance parameters consists of using the Profile Likelihood function (see this link). In your context, the profile likelihood is:

$$\mathcal{L}_P(\theta_1;\mathbf{x}) = \max_{\theta_2,\dots,\theta_n} \mathcal{L(\boldsymbol{\theta};\mathbf{x})}.$$

The object of interest is the normalized profile likelihood, which is nothing but

$$R_P(\theta_1, {\bf x}) = \frac{\mathcal{L}_P(\theta_1;\mathbf{x})}{\mathcal{L(\boldsymbol{\widehat{\theta}};\mathbf{x})}}.$$

This function can be used to construct confidence intervals on the parameter of interest. A thorough study of the profile likelihood can be found in:

Sprott, David A. Statistical Inference in Science. Springer Science & Business Media, 2008.

In some cases, some people assign a distribution to the "nuisance parameters" ($\theta_2,\dots,\theta_n$, in your case) and integrate them out. This is hybrid between Bayesian and Classical inference, and it is called the integrated likelihood. However, this requires assuming a distribution on the nuisance parameters in order to guarantee that the integral is finite. See:

Berger, James O., Brunero Liseo, and Robert L. Wolpert. "Integrated likelihood methods for eliminating nuisance parameters." Statistical Science 14.1 (1999): 1-28.

Note that, if you do not assign a proper distribution on the nuisance parameters, there is no guarantee that the marginal/integrated likelihood function is finite. Using a distribution $\pi(\theta_2, \dots, \theta_n)$ guarantees that

$\mathcal{L(\theta_1;\mathbf{x})} = \idotsint \mathcal{L(\boldsymbol{\theta};\mathbf{x})} \pi(\theta_2, \dots, \theta_n) \,d\theta_2 \dots d\theta_n < \infty,$

by the Bayes theorem (for regular models).

Related Question