[Math] Conjugate prior of a normal distribution with unknown mean

bayesianprobability theorystatistical-inferencestatistics

I'm following these notes to compute the conjugate prior of a normal distribution with unknown mean and known variance. At some point they claim:

$p(D|\mu) \propto exp(-\frac{n}{2\sigma²}(\overline{x}-\mu)^2) \propto N(\overline{x},\theta,\frac{1}{n})$

but then they claim that the natural conjugate prior has the form:

$p(\mu) \propto exp(-\frac{n}{2\sigma_0^2}(\mu-\mu_0)^2) \propto N(\mu|\mu_0,\sigma_0^2)$

Is this mathematics?

Contrast this with these other notes that write the likelihood:

$L(\mu|x_1,\ldots,x_n) = c exp(-\frac{n(\mu-\overline{x}}{2\sigma^2}-\frac{\sum (x_i-\overline{x})^2}{2\sigma^2})$

then they claim that one should focus on the term $exp(-\frac{n(\mu-\overline{x}}{2\sigma^2})$ which is said to be clearly propotional to a normal distribution of $\mu$!!!

What is going on here? How can I compute the conjugate prior of a normal distribution with unknown mean?

References:

A good account of this case can be found in De Groot's "Probability and Statistics"

Best Answer

Assume you model the likelihood through the sufficient statistic $\bar{X}$: \begin{equation} P(D|\mu) \propto \textrm{Exp}(-\frac{n}{2\sigma²}(\overline{x}-\mu)^2) \propto N(\overline{x},\theta,\frac{1}{n}) \end{equation} And use the following as a prior for $\mu$: \begin{equation} P(\mu) \propto \textrm{Exp}(-\frac{n}{2\sigma_0^2}(\mu-\mu_0)^2) \propto N(\mu|\mu_0,\sigma_0^2) \end{equation}

We need to show that the product of these two distributions has a kernel which is the kernel (as a function of $\mu$) of a normal distribution; so we can ignore all of the normalizing constants and any terms not containing $\mu$.

\begin{equation} \begin{split} P(D|\mu)P(\mu) & \propto \textrm{Exp}(-\frac{n}{2\sigma²}(\overline{x}-\mu)^2) \textrm{Exp}(-\frac{n}{2\sigma_0^2}(\mu-\mu_0)^2) \\ & = \textrm{Exp}(-\frac{n}{2}[\frac{1}{\sigma^2}(\bar{X}^2 - 2\bar{X}\mu + \mu^2)+\frac{1}{\sigma_0^2}(\mu^2 - 2\mu\mu_0 + \mu_0^2)]) \end{split} \end{equation}

Throwing away terms which do not contain $\mu$, we get: \begin{equation} \begin{split} \textrm{Exp}(-\frac{n}{2}[\frac{1}{\sigma^2}(2\bar{X}\mu + \mu^2)+\frac{1}{\sigma_0^2}(\mu^2 - 2\mu\mu_0)]) & \propto \textrm{Exp}(-\frac{n}{2}[\mu^2(\frac{1}{\sigma^2} + \frac{1}{\sigma_0^2}) +\mu(\frac{2\bar{X}}{\sigma^2}-\frac{2\mu_0}{\sigma_0^2})]) \end{split} \end{equation}

Now to get a quadratic in $\mu$ (and thus an appropriately specified normal distribution) we complete the square:

\begin{equation} \begin{split} \textrm{Exp}(-\frac{n}{2}[\mu^2(\frac{1}{\sigma^2} + \frac{1}{\sigma_0^2}) +\mu(\frac{2\bar{X}}{\sigma^2}-\frac{2\mu_0}{\sigma_0^2})]) & = \textrm{Exp}(-\frac{n}{2}[a(\mu-h)^2 + k]) \end{split} \end{equation}

For some $a$, $h$, and a term $k$ which does not involve $\mu$ and can be removed.

Specifically, ignoring $k$, we have that $a = (\frac{1}{\sigma^2} + \frac{1}{\sigma_0^2})$, and that $h = -\frac{(\frac{2\bar{X}}{\sigma^2}-\frac{2\mu_0}{\sigma_0^2})}{2(\frac{1}{\sigma^2} + \frac{1}{\sigma_0^2})}$.

This gives us our mean ($h$), and the reciprocal of our variance (up to rescaling by the $n$ factor present):

\begin{equation} \begin{split} \textrm{Exp}(-\frac{n}{2}[\mu^2(\frac{1}{\sigma^2} + \frac{1}{\sigma_0^2}) +\mu(\frac{2\bar{X}}{\sigma^2}-\frac{2\mu_0}{\sigma_0^2})]) & \propto \textrm{Exp}(-\frac{n(\frac{1}{\sigma^2} + \frac{1}{\sigma_0^2})}{2}[(\mu-\frac{(\frac{2\bar{X}}{\sigma^2}-\frac{2\mu_0}{\sigma_0^2})}{2(\frac{1}{\sigma^2} + \frac{1}{\sigma_0^2})})^2]) \end{split} \end{equation}

Which is a $\textrm{N}(\frac{(\frac{2\bar{X}}{\sigma^2}-\frac{2\mu_0}{\sigma_0^2})}{2(\frac{1}{\sigma^2} + \frac{1}{\sigma_0^2})}, (\frac{1}{n}(\frac{1}{\sigma^2} + \frac{1}{\sigma_0^2})^{-1})$.

Which corresponds to the result given here up to some canceling of values. The conjugate prior specified in the problem also has the $n$ factor, which the Wikipedia page doesn't have. You can just collapse that into $\sigma_0$, though, to get the same result they have.