How do we go about calculating a posterior with a prior N~(a, b) after observing n data points? I assume that we have to calculate the sample mean and variance of the data points and do some sort of calculation that combines the posterior with the prior, but I'm not quite sure what the combination formula looks like.
Bayesian Updating – How to Update Bayesian Models with New Data
bayesianconjugate-priornormal distribution
Related Solutions
The Gamma prior is not conjugate for the Gamma sampling model. See Conjugate prior for a Gamma distribution
The following R code shows how to sample from the posterior of a Gamma model with uniform priors on $[0,1000]$ for the scale and shape parameters, you can play with other priors, using an adaptive MCMC sampler.
set.seed(12345)
# Simulated data
data = rgamma(100,shape = 2.5,scale=97)
# Histogram and likelihood fit
hist(data)
library(MASS)
fitdistr(data,"gamma")
# -log posterior
mlogpost = function(par){
loglik = sum(dgamma(data,shape=par[1],scale = par[2],log=T))
logprior = -2*log(1000)
return(-loglik - logprior)
}
# Library for the MCMC sampler
library(Rtwalk)
# Initial point
init = c(2.5,97)
# Support of the posterior
Support <- function(x) {
((0 < x[1])&(x[2]>0))
}
# Random initial points (see documentation of Rtwalk)
X0 <- function(x) { init + runif(2,-0.1,0.1) }
# Sampler
info <- Runtwalk( dim=2, Tr=105000, Obj=mlogpost, Supp=Support, x0=X0(), xp0=X0())
#burn in and thinning the posterior samples
ind = seq(5000,105000,by=10)
sc = info$output[,1][ind]
sh = info$output[,2][ind]
# histograms of the posterior samples
hist(sc)
hist(sh)
# summary of the posterior samples
summary(sc)
summary(sh)
First of all, the formulas are defined in terms of variance, not standard deviations.
Second, the variance of the posterior is not a variance of your data but variance of estimated parameter $\mu$. As you can see from the description ("Normal with known variance $\sigma^2$"), this is formula for estimating $\mu$ when $\sigma^2$ is known. The prior parameters $\mu_0$ and $\sigma_0^2$ are parameters of distribution of $\mu$, hence the assumed model is
$$ \begin{align} X_i &\sim \mathrm{Normal}(\mu, \sigma^2) \\ \mu &\sim \mathrm{Normal}(\mu_0, \sigma_0^2) \end{align} $$
When both $\mu$ and $\sigma^2$ are unknown and are to be estimated, then you need slightly more complicated model (in Wikipedia table under "$\mu$ and $\sigma^2$ Assuming exchangeability"):
$$ \begin{align} X_i &\sim \mathrm{Normal}(\mu, \sigma^2) \\ \mu &\sim \mathrm{Normal}(\mu_0, \tfrac{\sigma^2}{n+\nu}) \\ \sigma^2 &\sim \mathrm{IG}(\alpha, \beta) \end{align} $$
where first we need to update parameters of inverse gamma distribution to obtain $\sigma^2$:
$$ \begin{align} \alpha' &= \alpha + \frac{n}{2} \\ \beta' &= \beta + \frac{1}{2}\sum_{i=1}^n (x_i -\bar x)^2 + \frac{n\nu(\bar x -\mu_0)^2}{2(n+\nu)} \end{align} $$
and then we can proceed to calculate $\mu$ and MAP point estimate for $\sigma^2$:
$$ \begin{align} \mu &= \frac{ \mu_0\nu + \bar x n }{\nu + n} \\ \operatorname{Mode}(\sigma^2) &= \frac{ \beta' }{ \alpha' + 1 } \end{align} $$
For learning more, refer to "Conjugate Bayesian analysis of the Gaussian distribution" paper by Kevin Murphy, or "The Conjugate Prior for the Normal Distribution" notes by Michael Jordan (notice that there are slight differences between those two sources and that some formulas are given for precision $\tau$ rather then variance) and M. DeGroot Optimal Statistical Decisions, McGraw-Hill, 1970 (pp. 169-171).
Best Answer
The basic idea of Bayesian updating is that given some data $X$ and prior over parameter of interest $\theta$, where the relation between data and parameter is described using likelihood function, you use Bayes theorem to obtain posterior
$$ p(\theta \mid X) \propto p(X \mid \theta) \, p(\theta) $$
This can be done sequentially, where after seeing first data point $x_1$ prior $\theta$ becomes updated to posterior $\theta'$, next you can take second data point $x_2$ and use posterior obtained before $\theta'$ as your prior, to update it once again etc.
Let me give you an example. Imagine that you want to estimate mean $\mu$ of normal distribution and $\sigma^2$ is known to you. In such case we can use normal-normal model. We assume normal prior for $\mu$ with hyperparameters $\mu_0,\sigma_0^2:$
\begin{align} X\mid\mu &\sim \mathrm{Normal}(\mu,\ \sigma^2) \\ \mu &\sim \mathrm{Normal}(\mu_0,\ \sigma_0^2) \end{align}
Since normal distribution is a conjugate prior for $\mu$ of normal distribution, we have closed-form solution to update the prior
\begin{align} E(\mu' \mid x) &= \frac{\sigma^2\mu + \sigma^2_0 x}{\sigma^2 + \sigma^2_0} \\[7pt] \mathrm{Var}(\mu' \mid x) &= \frac{\sigma^2 \sigma^2_0}{\sigma^2 + \sigma^2_0} \end{align}
Unfortunately, such simple closed-form solutions are not available for more sophisticated problems and you have to rely on optimization algorithms (for point estimates using maximum a posteriori approach), or MCMC simulation.
Below you can see data example:
If you plot the results, you'll see how posterior approaches the estimated value (it's true value is marked by red line) as new data is accumulated.
For learning more you can check those slides and Conjugate Bayesian analysis of the Gaussian distribution paper by Kevin P. Murphy. Check also Do Bayesian priors become irrelevant with large sample size? You can also check those notes and this blog entry for accessible step-by-step introduction to Bayesian inference.