Solved – Can anyone explain conjugate priors in simplest possible terms

bayesianconditional probabilityconjugate-prior

I have been trying to understand the idea of conjugate priors in Bayesian statistics for a while but I simply don't get it. Can anyone explain the idea in the simplest possible terms, perhaps using the "Gaussian prior" as an example?

Best Answer

A prior for a parameter will almost always have some specific functional form (written in terms of the density, generally). Let's say we restrict ourselves to one particular family of distributions, in which case choosing our prior reduces to choosing the parameters of that family.

For example, consider a normal model $Y_i \stackrel{_\text{iid}}{\sim} N(\mu,\sigma^2)$. For simplicity, let's also take $\sigma^2$ as known. This part of the model - the model for the data - determines the likelihood function.

To complete our Bayesian model, here we need a prior for $\mu$.

As mentioned above, commonly we might specify some distributional family for our prior for $\mu$ and then we only have to choose the parameters of that distribution (for example, often prior information may be fairly vague - like roughly where we want the probability to concentrate - rather than of very specific functional form, and we may have enough freedom to model what we want by choosing the parameters - say to match a prior mean and variance).

If it turns out that the posterior for $\mu$ is from the same family as the prior, then that prior is said to be "conjugate".

(What makes it turn out to be conjugate is the way it combines with the likelihood)

So in this case, let's take a Gaussian prior for $\mu$ (say $\mu\sim N(\theta,\tau^2)$). If we do that, we see that the posterior for $\mu$ is also Gaussian. Consequently, the Gaussian prior was a conjugate prior for our model above.

That's all there is to it really -- if the posterior is from the same family as the prior, it's a conjugate prior.

In simple cases you can identify a conjugate prior by inspection of the likelihood. For example, consider a binomial likelihood; dropping the constants, it looks like a beta density in $p$; and because of the way powers of $p$ and $(1-p)$ each combine, it will multiply by a beta prior to also give a product of powers of $p$ and $(1-p)$ ... so we can see immediately from the likelihood that the beta will be a conjugate prior for $p$ in the binomial likelihood.

In the Gaussian case it's easiest to see that it will happen by considering the log-densities and the log-likelihood; the log-likelihood will be quadratic in $\mu$ and the sum of two quadratics is quadratic, so a quadratic log-prior + quadratic log-likelihood gives a quadratic posterior (each of the coefficients of the highest order term will of course be negative).

Related Question