Solved – Why is it necessary to sample from the posterior distribution if we already KNOW the posterior distribution

bayesianinferencemarkov-chain-montecarloposteriorsimulation

My understanding is that when using a Bayesian approach to estimate parameter values:

The posterior distribution is the combination of the prior distribution and the likelihood distribution.
We simulate this by generating a sample from the posterior distribution (e.g., using a Metropolis-Hasting algorithm to generate values, and accept them if they are above a certain threshold of probability to belong to the posterior distribution).
Once we have generated this sample, we use it to approximate the posterior distribution, and things like its mean.

But, I feel like I must be misunderstanding something. It sounds like we have a posterior distribution and then sample from it, and then use that sample as an approximation of the posterior distribution. But if we have the posterior distribution to begin with why do we need to sample from it to approximate it?

Best Answer

This question has likely been considered already on this forum.

When you state that you "have the posterior distribution", what exactly do you mean? "Having" a function of $\theta$ that I know is proportional to the posterior, namely$$\pi(\theta|x) \propto \pi(\theta) \times f(x|\theta)$$for instance the completely artificial target$$\pi(\theta|x)\propto\exp\{-||\theta-x||^2-||\theta+x||^4-||\theta-2x||^6\},\ \ x,\theta\in\mathbb{R}^{18},$$does not tell me what is

the posterior expectation of a function of $\theta$, e.g., $\mathbb{E}[\mathfrak{h}(\theta)|x]$, posterior mean that operates as a Bayesian estimator under standard losses;
the optimal decision under an arbitrary utility function, decision that minimises the expected posterior loss;
a 90% or 95% range of uncertainty on the parameter(s), a sub-vector of the parameter(s), or a function of the parameter(s), aka HPD region$$\{h=\mathfrak{h}(\theta);\ \pi^\mathfrak{h}(h)\ge \underline{h}\}$$
the most likely model to choose between setting some components of the parameter(s) to specific values versus keeping them unknown (and random).

These are only examples of many usages of the posterior distribution. In all cases but the most simple ones, I cannot provide the answers by staring at the posterior distribution density and do need to proceed through numerical resolutions like Monte Carlo and Markov chain Monte Carlo methods.

Related Solutions

Solved – How to sample using MCMC from a posterior distribution in general

We don't use MCMC to calculate the $p(\theta | y)$ for each value (or many values) of $\theta$. What MCMC (or the special case of Gibbs sampling) does is generate a (large) random sample from $p(\theta | y)$. Note that $p(\theta | y)$ is not being calculated; you have to do something with that vector (or matrix) of random numbers to estimate $p(\theta)$. Since you're not calculating $p(\theta)$ for lots of values of $\theta$, you don't need a Gibbs (or MCMC) loop inside a $\theta$ loop - just one (long) Gibbs (or MCMC) loop.

EDIT in response to an update to the question: We do not need to integrate the distribution to get the constant of integration (CoI)! The whole value of MCMC is is found in situations where we can't calculate the CoI. Using MCMC, we can still generate random numbers from the distribution. If we could calculate the CoI, we could just calculate the probabilities directly, without the need to resort to simulation.

Once again, we are NOT calculating $p(\theta|y)$ using MCMC, we are generating random numbers from $p(\theta|y)$ using MCMC. A very different thing.

Here's an example from a simple case: the posterior distribution for the scale parameter from an Exponential distribution with a uniform prior. The data is in x, and we generate N <- 10000 samples from the posterior distribution. Observe that we are only calculating $p(x|\theta)$ in the program.

x <- rexp(100)

N <- 10000
theta <- rep(0,N)
theta[1] <- cur_theta <- 1  # Starting value
for (i in 1:N) {
   prop_theta <- runif(1,0,5)  # "Independence" sampler
   alpha <- exp(sum(dexp(x,prop_theta,log=TRUE)) - sum(dexp(x,cur_theta,log=TRUE)))
   if (runif(1) < alpha) cur_theta <- prop_theta
   theta[i] <- cur_theta
}

hist(theta)

And the histogram:

$Posterior distribution of $\theta$$

Note that the logic is simplified by our choice of sampler (the prop_theta line), as a couple of other terms in the next line (alpha <- ...) cancel out, so don't need to be calculated at all. It's also simplified by our choice of a uniform prior. Obviously we can improve this code a lot, but this is for expository rather than functional purposes.

Here's a link to a question with several answers giving sources for learning more about MCMC.

Best Answer

Related Solutions

Solved – How to sample using MCMC from a posterior distribution in general

Related Question