Likelihood – Disadvantages of Profile Likelihood

likelihoodmaximum likelihoodprofile-likelihood

Consider a vector of parameters $(\theta_1, \theta_2)$, with $\theta_1$ the parameter of interest, and $\theta_2$ a nuisance parameter.

If $L(\theta_1, \theta_2 ; x)$ is the likelihood constructed from the data $x$, the profile likelihood for $\theta_1$ is defined as $L_P(\theta_1 ; x) = L(\theta_1, \hat{\theta}_2(\theta_1) ; x)$ where $ \hat{\theta}_2(\theta_1)$ is the MLE of $\theta_2$ for a fixed value of $\theta_1$.

$\bullet$ Maximising the profile likelihood with respect to $\theta_1$ leads to same estimate $\hat{\theta}_1$ as the one obtained by maximising the likelihood simultaneously with respect to $\theta_1$ and $\theta_2$.

$\bullet$ I think the standard deviation of $\hat{\theta}_1$ may also be estimated from the second derivative of the profile likelihood.

$\bullet$ The likelihood statistic for $H_0: \theta_1 = \theta_0$ can be written in terms of the profile likelihood: $LR = 2 \log( \tfrac{L_P(\hat{\theta}_1 ; x)}{L_P(\theta_0 ; x)})$.

So, it seems that the profile likelihood can be used exactly as if it was a genuine likelihood. Is it really the case ? What are the main drawbacks of that approach ? And what about the 'rumor' that the estimator obtained from the profile likelihood is biased (edit: even asymptotically) ?

Best Answer

The estimate of $\theta_1$ from the profile likelihood is just the MLE. Maximizing with respect to $\theta_2$ for each possible $\theta_1$ and then maximizing with respect to $\theta_1$ is the same as maximizing with respect to $(\theta_1, \theta_2)$ jointly.

The key weakness is that, if you base your estimate of the SE of $\hat{\theta}_1$ on the curvature of the profile likelihood, you are not fully accounting for the uncertainty in $\theta_2$.

McCullagh and Nelder, Generalized linear models, 2nd edition, has a short section on profile likelihood (Sec 7.2.4, pgs 254-255). They say:

[A]pproximate confidence sets may be obtained in the usual way....such confidence intervals are often satisfactory if [the dimension of $\theta_2$] is small in relation to the total Fisher information, but are liable to be misleading otherwise.... Unfortunately [the profile log likelihood] is not a log likelihood function in the usual sense. Most obviously, its derivative does not have zero mean, a property that is essential for estimating equations.

Related Solutions

Solved – Hessian of profile likelihood used for standard error estimation

For some models we can use hessian of profile likelihood safely for some models not

Unfortunately, that is true for now and unlikley to change.

The clearest discussion that I am aware of is The rules of conditional inference: Is there a universal definition of nonformation? B Jørgensen - Statistical Methods & Applications, 1994.

And for some of the issues specific to adressing failures of profile likelhood Stafford, J. E. (1996). A robust adjustment of the profile likelihood, Annals of Statistics, 24, 336-52.

Solved – Likelihood ratio test for comparing two exponential distributions

The answer from the link provides a formula of ratio of likelihoods of the null and the alternative hypotheses with detailed derivation, and the R code below is my implementation for it with $\theta_1 = 1$, $\theta_2 = 2$, $n_1 = 70$, and $n_2 = 100$.

To generate samples, get cdf for exponential distribution first. If $f(x) = \frac{1}{\theta}e^{-\frac{x}{\theta}}$, F(x) = $\int_{0}^{x} \frac{1}{\theta}e^{-\frac{t}{\theta}} dt = 1 - e^{-\frac{x}{\theta}}$. If $u = 1 - e^{-\frac{x}{\theta}},$ then $x = -\theta*ln(1-u)$. So get $n_1$ samples $u \in [0, 1)$ and compute $x_i$ values with $\theta = \theta_1$ and $n_2$ samples $\in [0, 1)$ and compute $y_i$ values with $\theta = \theta_2$, and then follow the likelihood ratio formula to compute the ratio. The computed ratio is not always 1 using the formula after testing a few times.

theta_ln_1Minusu <- function(theta, vec){ size = length(vec); for(i in 1:size){ vec[i] = -theta*log(1-vec[i]); } return (vec); }

>theta1 = 1; theta2 = 2; n1 = 70; n2 = 100;
>u1vals = runif(n1); u2vals = runif(n2); 
> xvals = theta_ln_1Minusu(theta1, u1vals);
> yvals = theta_ln_1Minusu(theta2, u2vals);
> x_avg = sum(xvals) / n1; y_avg = sum(yvals) / n2; 
> x_avg
[1] 1.041831
> y_avg
[1] 1.733426
> w1 = n1/(n1+n2);w2 = n2/(n1+n2);
> likelihoodRatio = (w1+w2*y_avg/x_avg)^n1*(w1*x_avg/y_avg+w2)^n2;
> likelihoodRatio
[1] 168.8613

On the comment below:

In addition, regarding EngrStudent's comment, I spent a while to compute and found that the ratio is close to 1 only when x_avg is close to y_avg; in other words, $\theta_1$ is close to $\theta_2$. Denote $\frac{yavg}{xavg}$ as $x$, $x > 0$, divide the last term by $(n_1+n_2)$, and then compute the log of the last term in the link:

$w_1*ln(w_1+w_2x) + w_2*ln(w_1/x+w_2) = (w_1+w_2)*ln(w_1+w_2x) - w_2*ln(x) =ln(w_1+w_2x) -ln(x^{w_2})$

If this log value is 0, then $w_1+w_2x=x^{w_2}$. It is known that this equality holds when x = 1. The both sides are continuous. Now check the derivatives of both sides. The left side's is $w_2$, and the right side's is $w_2*x^{w_2-1}$. The increment rate of the left side is fixed to $w_2$, and $ w_2< 1$. If $x < 1$, $x^{w_2-1} > 1$, the increment rate of the right side is always larger than that of the left, $w_2$, and the left is larger than the right when x>0 close to 0. And if $x > 1$, the increment rate of the right side is always smaller than $w_2$. Thus, the equality holds only when $x = 1$.

Best Answer

Related Solutions

Solved – Hessian of profile likelihood used for standard error estimation

Solved – Likelihood ratio test for comparing two exponential distributions

Related Question