How to Compute a Posterior Density Estimate from a Prior and Likelihood?

bayesiancomputational-statisticssimulation

I am trying to understand how to use Bayes' theorem to calculate a posterior but am getting stuck with the computational approach, e.g., in the following case it is not clear to me how to take the product of the prior and likelihood and then calculate the posterior:

For this example, I am interested in calculating the posterior probability of $\mu$ and I use a standard normal prior on $\mu$ $p(\mu)\sim N(\mu = 0, \sigma = 1)$, but I want to know how to calculate the posterior from a prior on $\mu$ that is represented by an MCMC chain, so I will use 1000 samples as my starting point.

sample 1000 from the prior.

set.seed(0)
prior.mu      <- 0
prior.sigma   <- 1
prior.samples <- sort(rnorm(1000, prior.mu, prior.sigma))

make some observations:
```
observations <- c(0.4, 0.5, 0.8, 0.1)
```

and calculate the likelihood, e.g. $p(y | \mu, \sigma)$:

likelihood <- prod(dnorm(observations, mean(prior.samplse), sd(prior.samples)))

what I don't quite understand is:

when / how to multiply the prior by the likelihood?
when / how to normalize the posterior density?

please note: I am interested in the general computational solution that could be generalizable problems with no analytical solution

Best Answer

You have several things mixed up. The theory talks about multiplying the prior distribution and the likelihood, not samples from the prior distribution. Also it is not clear what you have the prior of, is this a prior on the mean of something? or something else?

Then you have things reversed in the likelihood, your observations should be x with either prior draws or known fixed constants as the mean and standard deviation. And even then it would really be the product of 4 calls to dnorm with each of your observations as x and the same mean and standard deviation.

What is really no clear is what you are trying to do. What is your question? which parameters are you interested in? what prior(s) do you have on those parameters? are there other parameters? do you have priors or fixed values for those?

Trying to go about things the way you currently are will only confuse you more until you work out exactly what your question is and work from there.

Below is added due after the editing of the original question.

You are still missing some pieces, and probably not understanding everything, but we can start from where you are at.

I think you are confusing a few concepts. There is the likelihood that shows the relationship between the data and the parameters, you are using the normal which has 2 parameters, the mean and the standard deviation (or variance, or precision). Then there is the prior distributions on the parameters, you have specified a normal prior with mean 0 and sd 1, but that mean and standard deviation are completely different from the mean and standard deviation of the likelihood. To be complete you need to either know the likelihood SD or place a prior on the likelihood SD, for simplicity (but less real) I will assume we know the likelihood SD is $\frac12$ (no good reason other than it works and is different from 1).

So we can start similar to what you did and generate from the prior:

> obs <- c(0.4, 0.5, 0.8, 0.1)
> pri <- rnorm(10000, 0, 1)

Now we need to compute the likelihoods, this is based on the prior draws of the mean, the likelihood with the data, and the known value of the SD. The dnorm function will give us the likelihood of a single point, but we need to multiply together the values for each of the observations, here is a function to do that:

> likfun <- function(theta) {
+ sapply( theta, function(t) prod( dnorm(obs, t, 0.5) ) )
+ }

Now we can compute the likelihood for each draw from the prior for the mean

> tmp <- likfun(pri)

Now to get the posterior we need to do a new type of draw, one approach that is similar to rejection sampling is to sample from the prior mean draws proportional to the likelihood for each prior draw (this is the closest to the multiplication step you were asking about):

> post <- sample( pri, 100000, replace=TRUE, prob=tmp )

Now we can look at the results of the posterior draws:

> mean(post)
[1] 0.4205842
> sd(post)
[1] 0.2421079
> 
> hist(post)
> abline(v=mean(post), col='green')

and compare the above results to the closed form values from the theory

> (1/1^2*mean(pri) + length(obs)/0.5^2 * mean(obs))/( 1/1^2 + length(obs)/0.5^2 )
[1] 0.4233263
> sqrt(1/(1+4*4))
[1] 0.2425356

Not a bad approximation, but it would probably work better to use a built-in McMC tool to draw from the posterior. Most of these tools sample one point at a time not in batches like above.

More realistically we would not know the SD of the likelihood and would need a prior for that as well (often the prior on the variance is a $\chi^2$ or gamma), but then it is more complicated to compute (McMC comes in handy) and there is no closed form to compare with.

The general solution is to use existing tools for doing the McMC calculations such as WinBugs or OpenBugs (BRugs in R gives an interface between R and Bugs) or packages such LearnBayes in R.

Related Solutions

Solved – Posterior very different to prior and likelihood

Yes this situation can arise and is a feature of your modeling assumptions specifically normality in the prior and sampling model (likelihood). If instead you had chosen a Cauchy distribution for your prior, the posterior would look much different.

prior = function(x) dcauchy(x, 1.5, 0.4)
like = function(x) dnorm(x,6.1,.4)

# Posterior
propto = function(x) prior(x)*like(x)
d = integrate(propto, -Inf, Inf)
post = function(x) propto(x)/d$value

# Plot
par(mar=c(0,0,0,0)+.1, lwd=2)
curve(like, 0, 8, col="red", axes=F, frame=T)
curve(prior, add=TRUE, col="blue")
curve(post, add=TRUE, col="seagreen")
legend("bottomleft", c("Prior","Likelihood","Posterior"), col=c("blue","red","seagreen"), lty=1, bg="white")

Cauchy prior, normal sampling model

Solved – Maximum likelihood parameters deviate from posterior distributions

With flat priors, the posterior is identical to the likelihood up to a constant. Thus

MLE (estimated with an optimizer) should be identical to the MAP (maximum a posteriori value = multivariate mode of the posterior, estimated with MCMC). If you don't get the same value, you have a problem with your sampler or optimiser.
For complex models, it is very common that the marginal modes are different from the MAP. This happens, for example, if correlations between parameters are nonlinear. This is perfectly fine, but marginal modes should therefore not be interpreted as the points of highest posterior density, and not be compared to the MLE.
In your specific case, however, I suspect that the posterior runs against the prior boundary. In this case, the posterior will be strongly asymmetric, and it doesn't make sense to interpret it in terms of mean, sd. There is no principle problem with this situation, but in practice it often hints towards model misspecification, or poorly chosen priors.

Best Answer

Related Solutions

Solved – Posterior very different to prior and likelihood

Solved – Maximum likelihood parameters deviate from posterior distributions

Related Question