[Math] Normal distribution with unknown mean and variance

bayesiannormal distributionprobability

I'm reading about Bayesian Data Analysis by Gelman et al. and I came across with an example which starts like this:

Example. Normal distribution with unknown mean and variance.

We illustrate the approximate normal distribution with a simple
theoretical example. Let $y_1, . . . , y_n$ be independent
observations from a $N(\mu, \sigma^2)$ distribution, and, for
simplicity, we assume a uniform prior density for $(\mu, \log
\sigma)$. We set up a normal approximation to the posterior
distribution of $(\mu, \log \sigma)$, which has the virtue of
restricting $\sigma$ to positive values. To construct the
approximation, we need the second derivatives of the log posterior
density,

$$\log p(\mu, \log\sigma \,|\, y) = constant − n \log \sigma −
\frac{1}{2\sigma^2} \left((n − 1)s^2 + n(\overline{y} − μ)^2\right),$$

where $\overline{y}=\frac1n\sum_{i=1}^n y_i$ and
$s^2=\frac{1}{n-1}\sum_{i=1}^n(y_i-\overline{y})^2$.

This log of the posterior is escaping me for some reason…I'm not fully following how the author arrived in the expression above. I did try to calculate the posterior as the product of likelihood and prior but I didn't get the same result.

Is there a change of variables etc. happening under the hood in this example or what?

In summary my question is: How do we arrive to the log posterior expression? Steps?

Thank you for your help!

Best Answer

For this you start by writing out your loglikelihood $$ \begin{align} \log p(y|\mu,\sigma) &= \sum_i \log (y_i | \mu,\sigma) \\ &= \sum_i \left(-\frac{(y_i-\mu)^2}{2\sigma^2} - \log(\sqrt{2\pi\sigma^2}) \right)\\ &= -n\log \sigma - \frac{1}{2\sigma^2}\sum_i(y_i-\mu)^2 + \mbox{constant}. \end{align} $$ Now what happens next is the following decomposition of the sum of squares term $$ \begin{align} \sum_i (y_i - \mu)^2 &= \sum_i \left( y_i - \bar{y} + \bar{y} - \mu)\right)^2 \\ &= \sum_i \left((y_i-\bar{y})-(\mu-\bar{y})\right)^2, \end{align} $$ you might like to go from here?

Ok if you are happy with that rearrangement of the likelihood then looking at the Bayesian aspects we are looking for the posterior $$ p(\mu,\log \sigma | y) \propto p(y|\mu,\log \sigma) \pi(\mu,\log \sigma) $$ where $\pi(\cdot)$ is our prior, now this is assumed to be uniform which will also sometimes be given like $$ \pi(\log \sigma) \propto 1, $$ now in specifying a uniform prior there is a little bit of handwaving going on - this is an improper prior if I want $\log \sigma$ to have an infinite support - I'm sure Gelman discusses this more in the book, a few releavent points are that improper priors can still lead to proper posteriors and we can compactify the infinite support if we want. Anyway assuming the $\mu$ also has an (independently of $\log\sigma$) uniform prior then taking the log of the posterior I have $$ \begin{align} \log p(\mu,\log \sigma | y) &\propto \log p(y | \mu,\log \sigma ) + \log \pi(\log \sigma) + \log \pi(\mu) \\ &\propto \log p(y|\mu,\log \sigma) \end{align} $$ Where to get the second line we used that both priors are proportional to a constant. Now as I mentioned as it stands this $\log \sigma$ on the righthand side is just a reparameterisation of the usual log-likelihood term which as far as writing this term out makes no difference, where it *will* make a difference is when we proceed to carry out differentiation to construct an approximation and so we will be differentiating with respect to $\log \sigma$ and not $\sigma$.

Finally why carry out this change of variables? Well I suspect what the author is going to do is construct a Laplacian approximation which amounts to a Gaussian approximation around the posterior mode, now if we were to construct this in the original $(\mu,\sigma)$ then the Guassian has support $\mathbb{R}^2$ and so will give positive probability to negative variances.