Solved – Marginal likelihood: why integration is used

bayesiancontinuous datamarginal-distribution

From wiki:

Given a set of independent identically distributed data points
$\mathbb{X}=(x_1,\ldots,x_n)$, where $x_i \sim p(x_i|\theta)$
according to some probability distribution parameterized by θ, where θ
itself is a random variable described by a distribution, i.e. $\theta
> \sim p(\theta|\alpha)$, the marginal likelihood in general asks what
the probability $p(\mathbb{X}|\alpha)$ is, where θ has been
marginalized out (integrated out):

$$p(\mathbb{X}|\alpha) = \int_\theta p(\mathbb{X}|\theta) \, p(\theta|\alpha)\ \operatorname{d}\!\theta $$

What I don't understand, is integration here. If under p() is random distribution function, as far as I understand it should come just to Bayes formula: $p(\mathbb{X}|\theta) \, p(\theta|\alpha)$.

Best Answer

It comes from the chain rule of probability, not the Bayes rule. Bayes rule is not exactly what you have stated. It also involves marginalization of a random variable. For any two random variables $X$ and $Y$ with a joint distribution $p(X,Y)$ you can compute the marginal distribution of $X$ as

$$ p(X) = \int_Y p(X,Y) dY $$

Similarly in your case the random variables are $X,\theta,\alpha.$ Therefore,

$$ p(X|\alpha) = \int_{\theta} p(X,\theta|\alpha) d\theta = \int_{\theta} p(X|\theta,\alpha) p(\theta|\alpha) d\theta $$

Note that we apply the chain rule $p(X,\theta|\alpha)=p(X|\theta,\alpha) p(\theta|\alpha)$

Now since $X$ is generated as $X \sim p(X|\theta),$ $X$ is independent of $\alpha$ given $\theta.$ Therefore $p(X|\theta,\alpha)=p(X|\theta).$ Therefore combining we get,

$$ p(X|\alpha) = \int_{\theta} p(X|\theta)p(\theta|\alpha) d\theta $$

Related Question