Bayesian Updating – Bayesian Updating with Conjugate Priors Using Closed Form Expressions

bayesianconjugate-priornormal distributionposteriorprior

I have one two data sets of scalar values: one large data set (about 700 data points) and one small data set (80 data points). I would like to update the large data set with the small one using the Bayes’ theorem, and so create another large data set (posterior).

The large data set serves as prior, and it is assumed to be normally distributed and so the posterior. This was motivated by the existence of the closed-form expression of the posterior distribution parameters https://en.wikipedia.org/wiki/Conjugate_prior (the first row in the table for Continuous distributions) for the conjugate prior.

However, if I substitute into the closed-form expressions for posterior mean and variance, using the mean and variance values of prior (inferred from the large data set) and local data (inferred from the small data), the resulting posterior distribution does not make sense.

Do I misunderstand that I can simply substitute into these closed-form expressions the known values in order to get the posterior distribution?

Best Answer

First of all, the formulas are defined in terms of variance, not standard deviations.

Second, the variance of the posterior is not a variance of your data but variance of estimated parameter $\mu$. As you can see from the description ("Normal with known variance $\sigma^2$"), this is formula for estimating $\mu$ when $\sigma^2$ is known. The prior parameters $\mu_0$ and $\sigma_0^2$ are parameters of distribution of $\mu$, hence the assumed model is

$$ \begin{align} X_i &\sim \mathrm{Normal}(\mu, \sigma^2) \\ \mu &\sim \mathrm{Normal}(\mu_0, \sigma_0^2) \end{align} $$

When both $\mu$ and $\sigma^2$ are unknown and are to be estimated, then you need slightly more complicated model (in Wikipedia table under "$\mu$ and $\sigma^2$ Assuming exchangeability"):

$$ \begin{align} X_i &\sim \mathrm{Normal}(\mu, \sigma^2) \\ \mu &\sim \mathrm{Normal}(\mu_0, \tfrac{\sigma^2}{n+\nu}) \\ \sigma^2 &\sim \mathrm{IG}(\alpha, \beta) \end{align} $$

where first we need to update parameters of inverse gamma distribution to obtain $\sigma^2$:

$$ \begin{align} \alpha' &= \alpha + \frac{n}{2} \\ \beta' &= \beta + \frac{1}{2}\sum_{i=1}^n (x_i -\bar x)^2 + \frac{n\nu(\bar x -\mu_0)^2}{2(n+\nu)} \end{align} $$

and then we can proceed to calculate $\mu$ and MAP point estimate for $\sigma^2$:

$$ \begin{align} \mu &= \frac{ \mu_0\nu + \bar x n }{\nu + n} \\ \operatorname{Mode}(\sigma^2) &= \frac{ \beta' }{ \alpha' + 1 } \end{align} $$

For learning more, refer to "Conjugate Bayesian analysis of the Gaussian distribution" paper by Kevin Murphy, or "The Conjugate Prior for the Normal Distribution" notes by Michael Jordan (notice that there are slight differences between those two sources and that some formulas are given for precision $\tau$ rather then variance) and M. DeGroot Optimal Statistical Decisions, McGraw-Hill, 1970 (pp. 169-171).

Related Question