Bayesian Updating – Bayesian Updating with Conjugate Priors Using Closed Form Expressions

bayesianconjugate-priornormal distributionposteriorprior

I have one two data sets of scalar values: one large data set (about 700 data points) and one small data set (80 data points). I would like to update the large data set with the small one using the Bayes’ theorem, and so create another large data set (posterior).

The large data set serves as prior, and it is assumed to be normally distributed and so the posterior. This was motivated by the existence of the closed-form expression of the posterior distribution parameters https://en.wikipedia.org/wiki/Conjugate_prior (the first row in the table for Continuous distributions) for the conjugate prior.

However, if I substitute into the closed-form expressions for posterior mean and variance, using the mean and variance values of prior (inferred from the large data set) and local data (inferred from the small data), the resulting posterior distribution does not make sense.

Do I misunderstand that I can simply substitute into these closed-form expressions the known values in order to get the posterior distribution?

Best Answer

First of all, the formulas are defined in terms of variance, not standard deviations.

Second, the variance of the posterior is not a variance of your data but variance of estimated parameter $\mu$. As you can see from the description ("Normal with known variance $\sigma^2$"), this is formula for estimating $\mu$ when $\sigma^2$ is known. The prior parameters $\mu_0$ and $\sigma_0^2$ are parameters of distribution of $\mu$, hence the assumed model is

$$ \begin{align} X_i &\sim \mathrm{Normal}(\mu, \sigma^2) \\ \mu &\sim \mathrm{Normal}(\mu_0, \sigma_0^2) \end{align} $$

When both $\mu$ and $\sigma^2$ are unknown and are to be estimated, then you need slightly more complicated model (in Wikipedia table under "$\mu$ and $\sigma^2$ Assuming exchangeability"):

$$ \begin{align} X_i &\sim \mathrm{Normal}(\mu, \sigma^2) \\ \mu &\sim \mathrm{Normal}(\mu_0, \tfrac{\sigma^2}{n+\nu}) \\ \sigma^2 &\sim \mathrm{IG}(\alpha, \beta) \end{align} $$

where first we need to update parameters of inverse gamma distribution to obtain $\sigma^2$:

$$ \begin{align} \alpha' &= \alpha + \frac{n}{2} \\ \beta' &= \beta + \frac{1}{2}\sum_{i=1}^n (x_i -\bar x)^2 + \frac{n\nu(\bar x -\mu_0)^2}{2(n+\nu)} \end{align} $$

and then we can proceed to calculate $\mu$ and MAP point estimate for $\sigma^2$:

$$ \begin{align} \mu &= \frac{ \mu_0\nu + \bar x n }{\nu + n} \\ \operatorname{Mode}(\sigma^2) &= \frac{ \beta' }{ \alpha' + 1 } \end{align} $$

For learning more, refer to "Conjugate Bayesian analysis of the Gaussian distribution" paper by Kevin Murphy, or "The Conjugate Prior for the Normal Distribution" notes by Michael Jordan (notice that there are slight differences between those two sources and that some formulas are given for precision $\tau$ rather then variance) and M. DeGroot Optimal Statistical Decisions, McGraw-Hill, 1970 (pp. 169-171).

Related Solutions

Solved – Marginal prior $p(\mu)$ of mean of a normal distribution when both mean and variance are unknown

If you wanted to get a closed-form solution for the posterior distribution then you would use a conjugate prior, in which case $\mu$ and $\sigma^2$ would not be independent. Since you are using an MCMC algorithm to produce an approximation to the posterior distribution, there is no need to have a conjugate prior. I believe the prior in your code is intended to approximate a flat prior, $p(\mu,1/\sigma^2) \propto 1$.

But suppose you wanted to use an MCMC algorithm to simulate draws from the posterior using the conjugate prior just for the heck of it. Then you would write the code for the joint distribution for $(\mu,\sigma^2)$, which is not the product of the two marginal distributions. (You are correct that the marginal prior distribution for $\mu$ would be $t$.)

Bayesian Updating – How to Update Bayesian Models with New Data

The basic idea of Bayesian updating is that given some data $X$ and prior over parameter of interest $\theta$, where the relation between data and parameter is described using likelihood function, you use Bayes theorem to obtain posterior

$$ p(\theta \mid X) \propto p(X \mid \theta) \, p(\theta) $$

This can be done sequentially, where after seeing first data point $x_1$ prior $\theta$ becomes updated to posterior $\theta'$, next you can take second data point $x_2$ and use posterior obtained before $\theta'$ as your prior, to update it once again etc.

Let me give you an example. Imagine that you want to estimate mean $\mu$ of normal distribution and $\sigma^2$ is known to you. In such case we can use normal-normal model. We assume normal prior for $\mu$ with hyperparameters $\mu_0,\sigma_0^2:$

\begin{align} X\mid\mu &\sim \mathrm{Normal}(\mu,\ \sigma^2) \\ \mu &\sim \mathrm{Normal}(\mu_0,\ \sigma_0^2) \end{align}

Since normal distribution is a conjugate prior for $\mu$ of normal distribution, we have closed-form solution to update the prior

\begin{align} E(\mu' \mid x) &= \frac{\sigma^2\mu + \sigma^2_0 x}{\sigma^2 + \sigma^2_0} \\[7pt] \mathrm{Var}(\mu' \mid x) &= \frac{\sigma^2 \sigma^2_0}{\sigma^2 + \sigma^2_0} \end{align}

Unfortunately, such simple closed-form solutions are not available for more sophisticated problems and you have to rely on optimization algorithms (for point estimates using maximum a posteriori approach), or MCMC simulation.

Below you can see data example:

n <- 1000
set.seed(123)
x     <- rnorm(n, 1.4, 2.7)
mu    <- numeric(n)
sigma <- numeric(n)

mu[1]    <- (10000*x[i] + (2.7^2)*0)/(10000+2.7^2)
sigma[1] <- (10000*2.7^2)/(10000+2.7^2)
for (i in 2:n) {
  mu[i]    <- ( sigma[i-1]*x[i] + (2.7^2)*mu[i-1] )/(sigma[i-1]+2.7^2)
  sigma[i] <- ( sigma[i-1]*2.7^2                  )/(sigma[i-1]+2.7^2)
}

If you plot the results, you'll see how posterior approaches the estimated value (it's true value is marked by red line) as new data is accumulated.

For learning more you can check those slides and Conjugate Bayesian analysis of the Gaussian distribution paper by Kevin P. Murphy. Check also Do Bayesian priors become irrelevant with large sample size? You can also check those notes and this blog entry for accessible step-by-step introduction to Bayesian inference.

Best Answer

Related Solutions

Solved – Marginal prior $p(\mu)$ of mean of a normal distribution when both mean and variance are unknown

Bayesian Updating – How to Update Bayesian Models with New Data

Related Question