Solved – Proof of the Derivation of the marginal and conditional Gaussian

bayesiannormal distribution

Given a marginal Gaussian distribution for x and a conditional Gaussian distribution for y given x in the form
$$p(x) = N(x|\mu, \Lambda^{-1})$$
$$p(y|x) = N(y|Ax + b, L^{-1})$$

the marginal distribution of y and the conditional distribution of x given y are given by
$$p(y) = N(y|A\mu +b, L^{-1} + A\Lambda^{-1}A^T)$$
$$p(x|y) = N(x|\Sigma{A^TL(y-b) + \Lambda \mu}, \Sigma)$$

where $\Sigma = (\Lambda + A^TLA)^{-1}$.

This is taken from the textbook Pattern Recognition and Machine Learning.

The textbook has a proof but it is very brief and I could not fully understand the proof. Does anyone have a simple and detailed proof that can help me understand this fact? Thanks.

Best Answer

Let's write the RV $X$, $Y$ as $$ X = \mu + \varepsilon _x \\ Y = AX + b + \varepsilon _y $$ with $ \varepsilon _x \sim \mathcal N (0, \Lambda ^{-1})$, $ \varepsilon _y \sim \mathcal N (0, L ^{-1})$. Now pluging in X in the second equation above gives

$$ Y = A\mu + b + A\varepsilon _x + \varepsilon _y. $$

This is a linear combination of normal distributed random variables and as such itself normal distributed with expectation $A\mu +b$ and covariance matrix $A\Lambda ^{-1}A^T + L^{-1}$ ($var(AX)=Avar(X)A^T$). From this you get $p(y)$.

The second fact is a bit more complicated and involves some tedious calculation. From Bayes Theorem it follows that $$ p(x|y) \propto p(y|x)p(x) \\ \propto \exp ((y-Ax-b)^TL(y-Ax-b) + (x-\mu)^T\Lambda(x-\mu)). $$

If you multiply everything out and then factor out $x$ you get to something proportional to $$ \exp( (x-(\Lambda + A^TLA)^{-1}A^TL(y-b))^T(\Lambda + A^TLA)(x-(\Lambda + A^TLA)^{-1}AL(y-b))) $$ which is proportional to your given normal.

Related Question