Solved – Explanation that the prior predictive (marginal) distribution follows from prior and sampling distributions

bayesianconditioningmarginal-distributionpredictionprior

While I have a vague intuition that this makes sense, I am interested in the formal demonstration that the prior predictive distribution in Bayesian inference is equal to the integral over $\theta$ of the product of the prior distribution $p(\theta)$ and the sampling distribution $p(y|\theta)$, such that:

$$p(y) = \int_{\theta} p(\theta) p(y|\theta)\text{d}\theta.$$

Could one say that the integral makes the distribution unconditional (i.e. it removes the conditionality) by integrating over all possible parameters?

If so, is there a more formal explanation?

Best Answer

The equation follows from the definition of marginal distribution:

$$ p(y) = \int_\theta{p(y, \theta)} $$

And, from factoring the joint probability of data and parameters into conditional probabilities, like so: $$p(y, \theta) = p(y|\theta)p(\theta)$$

(If this is confusing, divide both sides by $p(\theta)$ to get the familiar definition of conditional probability.)

More plainly, and as referenced in comments, the prior predictive distribution is the Bayesian term defined as the marginal distribution of the data over the prior: It denotes an interpretation of a particular marginal distribution.

Related Question