Likelihood – Disadvantages of Profile Likelihood

likelihoodmaximum likelihoodprofile-likelihood

Consider a vector of parameters $(\theta_1, \theta_2)$, with $\theta_1$ the parameter of interest, and $\theta_2$ a nuisance parameter.

If $L(\theta_1, \theta_2 ; x)$ is the likelihood constructed from the data $x$, the profile likelihood for $\theta_1$ is defined as $L_P(\theta_1 ; x) = L(\theta_1, \hat{\theta}_2(\theta_1) ; x)$ where $ \hat{\theta}_2(\theta_1)$ is the MLE of $\theta_2$ for a fixed value of $\theta_1$.

$\bullet$ Maximising the profile likelihood with respect to $\theta_1$ leads to same estimate $\hat{\theta}_1$ as the one obtained by maximising the likelihood simultaneously with respect to $\theta_1$ and $\theta_2$.

$\bullet$ I think the standard deviation of $\hat{\theta}_1$ may also be estimated from the second derivative of the profile likelihood.

$\bullet$ The likelihood statistic for $H_0: \theta_1 = \theta_0$ can be written in terms of the profile likelihood: $LR = 2 \log( \tfrac{L_P(\hat{\theta}_1 ; x)}{L_P(\theta_0 ; x)})$.

So, it seems that the profile likelihood can be used exactly as if it was a genuine likelihood. Is it really the case ? What are the main drawbacks of that approach ? And what about the 'rumor' that the estimator obtained from the profile likelihood is biased (edit: even asymptotically) ?

Best Answer

The estimate of $\theta_1$ from the profile likelihood is just the MLE. Maximizing with respect to $\theta_2$ for each possible $\theta_1$ and then maximizing with respect to $\theta_1$ is the same as maximizing with respect to $(\theta_1, \theta_2)$ jointly.

The key weakness is that, if you base your estimate of the SE of $\hat{\theta}_1$ on the curvature of the profile likelihood, you are not fully accounting for the uncertainty in $\theta_2$.

McCullagh and Nelder, Generalized linear models, 2nd edition, has a short section on profile likelihood (Sec 7.2.4, pgs 254-255). They say:

[A]pproximate confidence sets may be obtained in the usual way....such confidence intervals are often satisfactory if [the dimension of $\theta_2$] is small in relation to the total Fisher information, but are liable to be misleading otherwise.... Unfortunately [the profile log likelihood] is not a log likelihood function in the usual sense. Most obviously, its derivative does not have zero mean, a property that is essential for estimating equations.