Solved – How to complete the square with normal likelihood and normal prior

bayesianlikelihoodnormal distributionprior

How do I complete the square from the point I have left off at, and is this correct so far?

I have a normal prior for $\beta$ of the form $p(\beta|\sigma^2)\sim \mathcal{N}(0,\sigma^2V)$, to get:

$p(\beta|\sigma^2)=(2\pi\sigma^2V)^\frac{p}{2}\exp[-\frac{1}{2\sigma^2}\beta^T\beta]$

where $\beta^T\beta$ is $\sum\limits_{i=1}^p \beta_i^2$.

My likelihood has a normal distribtuion for the data points y of the form $p(y|\beta,\sigma^2)\sim\mathcal{N}(B\beta,\sigma^2I) $

$p(y|\beta,\sigma^2)=(2\pi \sigma^2V)^\frac{n}{2}\exp[-\frac{1}{2\sigma^2}({\bf y}-{\bf B}{\bf \beta})^T({\bf y}-{\bf B}{\bf \beta})]$

(Note that $\beta$ is a matrix/vector too, \bf does not work.)

To get my posterior for $\beta$ I combined the above, took the exponential parts only, and then expanded to get:

$\exp[-\frac{1}{2\sigma^2}({\bf y}^T{\bf y}-{\bf y}^T{\bf B}\beta-\beta{\bf B}^T{\bf y}-\beta^T{\bf B}^T{\bf B}\beta)]\exp[-\frac{1}{2\sigma^2}({\bf \beta}^T{\bf B})]$.

I dropped the $({\bf y}^T{\bf y})$ term, as is not a function of $\beta$.

Putting into one expression without the exponential:

$-\frac{1}{2\sigma^2}(-{\bf y}^T{\bf B}\beta-\beta{\bf B}^T{\bf y}-\beta^T{\bf B}^T{\bf B}\beta+{\bf \beta}^T{\bf B})$.

I know I need to combine the similar terms and get into the form of the multivariate normal distribution, which is what I am aiming for, but I am unsure how to do this? I probably have to add an extra term to the expression to get it into the correct form?

Note: This is not homework, it's a project, but my Bayesian working knowledge is not good at all and so I need to understand the working out. I intend to integrate out the $\beta$ and then the $\sigma^2$ after getting into the multivariate form.

Best Answer

I'll start from scratch, since the original post has some math typos like wrong signs, dropping the $V$ matrix, etc.

You've specified prior $p(\beta)=\mathcal{N}( 0, \sigma^2 V )$ and likelihood: $p(y | \beta ) = \mathcal{N}( B\beta, \sigma^2I )$.

We can write each of these purely as expressions of terms inside the $\exp$ that depend on $\beta$, grouping all terms unrelated to $\beta$ into a single constant:

$\log p( \beta ) + \mbox{const} = -\frac{1}{2\sigma^2} \beta^T V^{-1} \beta$

$\log p( y | \beta ) + \mbox{const} = -\frac{1}{2\sigma^2}( \beta^T B^TB \beta - 2y^T B \beta ) \quad$ (note that $y^TB\beta = \beta^T B^T y$ always)

Added these in log space and collecting like terms yields the unnormalized log posterior

$\log p( \beta | y ) + \mbox{const} = -\frac{1}{2\sigma^2}( \beta^T(V^{-1} + B^TB)\beta - 2y^T B \beta )\quad$ (1)

... here, we've used the standard identity $x^TAx + x^TCx = x^T(A+C)x$ for any vectors $x$ and matrices $A,C$ of appropriate size.

OK, our goal is now to "complete" the square. We'd like an expression of the form below, which would indicate that the posterior for $\beta$ is Gaussian.

$\log p( \beta | y ) + \mbox{const} = (\beta - \mu_p)^T \Lambda_p (\beta - \mu_p ) = \beta^T \Lambda_p \beta -2\mu_p^T \Lambda_p \beta + \mu_p^T \Lambda_p \mu_p$

where parameters $\mu_p, \Lambda_p$ define the posterior mean and inverse covariance matrix respectively.

Well, by inspection eqn. (1) looks a lot like this form if we set

$\Lambda_p = V^{-1} + B^TB \quad$ and $\quad \mu_p = \Lambda_p^{-1}B^Ty$

In detail, we can show that this substitution creates each necessary term from (1):

quadratic term: $\beta^T \Lambda_p \beta = \beta^T( V^{-1} + B^TB)\beta$

linear term: $\mu_p^T \Lambda_p \beta = ( \Lambda_p^{-1}B^Ty )^T \Lambda_p \beta = y^T B \Lambda_p^{-1} \Lambda_p \beta = y^T B \beta$

.... here we used facts $(AB)^T = B^T A^T$ and $(\Lambda_p^{-1})^T =\Lambda_p^{-1}$ due to symmetry ($\Lambda_p$ is symmetric, then so is its inverse).

However, this leaves us with a pesky extra term $\mu_p^T \Lambda_p \mu_p$. To avoid this, we just subtract this term from our final result. Thus, we can directly substitute our $\mu_p, \Lambda_p$ parameters into (1) to get

$\log p( \beta | y ) + \mbox{const} = -\frac{1}{2\sigma^2}[ (\beta-\mu_p)^T\Lambda_p(\beta-\mu_p) - \mu_p\Lambda_p\mu_p ]$

since that last term is constant with respect to $\beta$, we can just smash it into the big normalization constant on the left hand side and we've achieved our goal.

Related Question