Solved – Marginal likelihood vs. prior predictive probability

bayesianprior

In the Bayesian framework, to me, it seems that the marginal likelihood and the prior predictive distribution/probability are equal. Is that the case? Or maybe this just holds for single data points? Why differ between these two terms?

Marginal likelihood (evidence):
$$
p(\mathbb{X}|\alpha) = \int_\theta p(\mathbb{X}|\theta) \, p(\theta|\alpha)\ \operatorname{d}\!\theta
$$

Prior predictive distribution:
$$
p(\tilde{x}|\alpha) = \int_{\theta} p(\tilde{x}|\theta) \, p(\theta|\alpha) \operatorname{d}\!\theta
$$

Best Answer

I'm assuming $\alpha$ contains the values that define your prior for $\theta$. When this is the case, we typically omit $\alpha$ from the notation and have the marginal likelihood $$p(\mathbb{X}) = \int p(\mathbb{X}|\theta) p(\theta) d\theta.$$ The prior predictive distribution is not well defined in that you haven't told me what it is that you want predicted, e.g. the prior predictive distribution is different when predicting a single data point and predicting a set of observations. In the notation, this is confusing because $p(\tilde{x}|\theta)$ is different depending on what $\tilde{x}$ is.

If you want to predict data that has exactly the same structure as the data you observed, then the marginal likelihood is just the prior predictive distribution for data of this structure evaluated at the data you observed, i.e. the marginal likelihood is a number whereas the prior predictive distribution has a probability density (or mass) function.

Best Answer

Related Solutions

Solved – Explanation that the prior predictive (marginal) distribution follows from prior and sampling distributions

Solved – Marginal likelihood: why integration is used

Related Question