Solved – In the most basic sense, what is marginal likelihood

bayesianlikelihoodprobability

I understand that likelihood differs from a probability distribution because likelihood describes the probability of certain parameter values given the data that you've observed (it's essentially a distribution that describes observed data) while a probability distribution describes the probability of observing certain values given constant parameter values. But what is a marginal likelihood and how does it relate to posterior distributions? (preferably explained without, or with as little as possible, probability notation so that the explanation is more intuitive). Any examples would be great as well.

Best Answer

In Bayesian statistics, the marginal likelihood $$m(x) = \int_\Theta f(x|\theta)\pi(\theta)\,\text d\theta$$ where

  1. $x$ is the sample
  2. $f(x|\theta)$ is the sampling density, which is proportional to the model likelihood
  3. $\pi(\theta)$ is the prior density

is a misnomer in that

  1. it is not a likelihood function [as a function of the parameter], since the parameter is integrated out (i.e., the likelihood function is averaged against the prior measure),
  2. it is a density in the observations, the predictive density of the sample,
  3. it is not defined up to a multiplicative constant,
  4. it does not solely depend on sufficient statistics

Other names for $m(x)$ are evidence, prior predictive, partition function. It has however several important roles:

  1. this is the normalising constant of the posterior distribution$$\pi(\theta|x) = \dfrac{f(x|\theta)\pi(\theta)}{m(x)}$$
  2. in model comparison, this is the contribution of the data to the posterior probability of the associated model and the numerator or denominator in the Bayes factor.
  3. it is a measure of goodness-of-fit (of a model to the data $x$), in that $2\log m(x)$ is asymptotically the BIC (Bayesian information criterion) of Schwarz (1978).

See also

Normalizing constant in Bayes theorem

Normalizing constant irrelevant in Bayes theorem?

Intuition of Bayesian normalizing constant

Related Question