I am reading Bishop's Pattern Recognition and Machine Learning.
In page 73, chapter 2.1. I can't understand the formula 2.19 :
$$p(x=1|\mathcal{D})=\int_0^1 p(x=1|\mu)p(\mu|\mathcal{D})\text{d}\mu $$
The author say, this is obtained by sum and product rules.
The sum rule is:
$$p(X) = \sum_Y p(X,Y)$$
and the product rule is:
$$p(X,Y)=p(Y|X)p(X)$$
But from this, I can't deduce the formula. Could you help me … thanks very much.
Best Answer
\begin{align} p(x|\mathcal{D})&\overset{(a)}=\int_0^1p(x,\mu|\mathcal{D})d\mu \\ &\overset{(b)}=\int_0^1p(x|\mu,\mathcal{D})p(\mu|\mathcal{D})d\mu \\ &\overset{(c)}=\int_0^1p(x|\mu)p(\mu|\mathcal{D})d\mu \end{align}
where (a) is application of the sum rule, (b) is application of the product rule, and (c) holds when $p(x|\mu,\mathcal{D}) = p(x|\mu)$, i.e., $x$ conditioned on $\mu$ is independent of $\mathcal{D}$.