Confidence Intervals – How Bayesian and Frequentist Approaches Differ in Handling Nuisance Parameters

confidence intervalcredible-interval

The wiki article on credible intervals has the following statement:

credible intervals and confidence intervals treat nuisance parameters in radically different ways.

What is the radical difference that the wiki talks about?

Credible intervals are based on the posterior distribution of the parameter and confidence interval is based on the maximum likelihood associated with the data generating process. It seems to me that how credible and confidence intervals are computed is not dependent on whether the parameters are nuisance or not. So, I am a bit puzzled by this statement.

PS: I am aware of alternative approaches to dealing with nuisance parameters under frequentist inference but I think they are less common than standard maximum likelihood. (See this question on the difference between partial, profile and marginal likelihoods.)

Best Answer

The fundamental difference is that in maximum likelihood based methods we can't integrate the nuisance parameters out (because the likelihood function is not a PDF and doesn't obey probability laws).

In maximum likelihood methods, the ideal way to deal with nuisance parameters is through marginal/conditional likelihoods, but these are defined differently from the question you linked. (There is a notion of an integrated (marginal/conditional) likelihood function as in the linked question, but this is not strictly the marginal likelihood function.)

Say you have a parameter of interest, $\theta$, a nuisance parameter, $\lambda$. Suppose a transformation of your data $X$ to $(Y, Z)$ exists such that either $Y$ or $Y|Z$ depends only on $\theta$. If $Y$ depends on $\theta$, then the joint density can be written

$f(Y, Z; \theta, \lambda) = f_{Y}(Y; \theta) f_{Z|Y}(Z|Y; \theta, \lambda)$.

In the latter case, we have

$f(Y, Z; \theta, \lambda) = f_{Y|Z}(Y|Z; \theta) f_{Z}(Z; \theta, \lambda)$.

In either case, the factor depending on $\theta$ alone is of interest. In the former, it's the basis for the definition of the marginal likelihood and in the latter, the conditional likelihood. The important point here is to isolate a component that depends on $\theta$ alone.

If we can't find such a transformation, we look at other likelihood functions to eliminate the nuisance. We usually start with a profile likelihood. To eliminate bias in the MLE, we try to obtain approximations for marginal or conditional likelihoods, usually through a "modified profile likelihood" function (yet another likelihood function!).

There are many details, but the short story is that the likelihood methods treat nuisance parameters quite differently than Bayesian methods. In particular, the estimated likelihoods don't account for uncertainty in the nuisance. Bayesian methods do account for it through the specification of a prior.

There are arguments in favor of an integrated likelihood function and lead to something resembling the Bayesian framework. If you're interested, I can dig up some references.

Related Question