Solved – How to derive the conjugate prior of an exponential family distribution

bayesianconditional probabilitydistributionsexponential-familyprobability

I am trying to derive the conjugate prior of the univariate Gaussian distribution over both the mean and the precision. I know that the prior I'm looking for is the normal-gamma distribution, but the idea is to derive this result.

It seems that you can write down the conjugate prior of an exponential family distribution immediately if you represent it in the canonical exponential family form. This link suggests a way to do just that. So far I have written the Gaussian probability function in this form and equated the following, but I don't see how the form of the conjugate prior is immediately obvious.

$$p(\mu, \lambda | D) \propto p(\mu,\lambda)p(D | \mu, \lambda)$$

Best Answer

The intuitive approach to conjugate priors is to try to deduce a family of distributions from the likelihood function. In the normal case, the likelihood is \begin{align*}\ell(\mu,\Sigma|x_1,\ldots,x_n)&\propto|\Sigma|^{-n/2}\,\exp\left\{ -\frac{1}{2}\sum_{i=1}^n (x_i-\mu)^\text{T}\Sigma^{-1}(x_i-\mu)\right\}\\ &\propto|\Sigma|^{-n/2}\,\exp\left\{-\frac{1}{2}\sum_{i=1}^n (x_i-\bar{x})^\text{T}\Sigma^{-1}(x_i-\bar{x})\right\}\\&\qquad \times \exp\left\{-\frac{n}{2}(\bar{x}-\mu)^\text{T}\Sigma^{-1}(\bar{x}-\mu)\right\}\\ &\propto |\Sigma|^{-n/2}\,\exp\left\{-\frac{1}{2}\text{tr}\left(\Sigma^{-1}S_n \right)-\frac{n}{2}(\bar{x}-\mu)^\text{T}\Sigma^{-1}(\bar{x}-\mu)\right\}\\ \end{align*}where $$S_n=\sum_{i=1}^n (x_i-\bar{x})(x_i-\bar{x})^\text{T}$$So we have three items in this likelihood:

  1. a power of $|\Sigma|$;
  2. an exponential of a trace of $\Sigma^{-1}$ times another matrix;
  3. an exponential of a quadratic in $\mu$ with matrix $\Sigma^{-1}$.

And all three terms are stable by multiplication, i.e.

  1. $|\Sigma|^a\times |\Sigma|^b = |\Sigma|^{a+b}$;
  2. $\exp\left\{-\text{tr}\left(\Sigma^{-1}A\right)\right\}\times\exp\left\{-\text{tr}\left(\Sigma^{-1}B\right)\right\}=\exp\left\{-\text{tr}\left(\Sigma^{-1}[A+B]\right)\right\}$;
  3. $\exp\left\{-(a-\mu)^\text{T}\alpha\Sigma^{-1}(a-\mu)\right\}\times\exp\left\{-(b-\mu)^\text{T}\beta\Sigma^{-1}(b-\mu)\right\}$ remains an exponential of a quadratic in $\mu$ with matrix $\Sigma^{-1}$ (with an extra term of the form $\exp\left\{-\text{tr}\left(\Sigma^{-1}A\right)\right\}$ as this is not a perfect quadratic term).

This means that the likelihood induces a shape of prior that remains stable by multiplication with another term with this shape. Which is a way of defining conjugacy. So, if I take my prior to be $$\pi(\mu,\Sigma)\propto|\Sigma|^{-\gamma/2}\,\exp\left\{-\frac{1}{2}\text{tr}\left(\Sigma^{-1}\Xi \right)-\frac{\nu}{2}(\xi-\mu)^\text{T}\Sigma^{-1}(\xi-\mu)\right\}$$the posterior will look the same, except that $\gamma,\Xi,\nu,\xi$ will change.

Related Question