Solved – Why is it necessary to sample from the posterior distribution if we already KNOW the posterior distribution

bayesianinferencemarkov-chain-montecarloposteriorsimulation

My understanding is that when using a Bayesian approach to estimate parameter values:

  • The posterior distribution is the combination of the prior distribution and the likelihood distribution.
  • We simulate this by generating a sample from the posterior distribution (e.g., using a Metropolis-Hasting algorithm to generate values, and accept them if they are above a certain threshold of probability to belong to the posterior distribution).
  • Once we have generated this sample, we use it to approximate the posterior distribution, and things like its mean.

But, I feel like I must be misunderstanding something. It sounds like we have a posterior distribution and then sample from it, and then use that sample as an approximation of the posterior distribution. But if we have the posterior distribution to begin with why do we need to sample from it to approximate it?

Best Answer

This question has likely been considered already on this forum.

When you state that you "have the posterior distribution", what exactly do you mean? "Having" a function of $\theta$ that I know is proportional to the posterior, namely$$\pi(\theta|x) \propto \pi(\theta) \times f(x|\theta)$$for instance the completely artificial target$$\pi(\theta|x)\propto\exp\{-||\theta-x||^2-||\theta+x||^4-||\theta-2x||^6\},\ \ x,\theta\in\mathbb{R}^{18},$$does not tell me what is

  1. the posterior expectation of a function of $\theta$, e.g., $\mathbb{E}[\mathfrak{h}(\theta)|x]$, posterior mean that operates as a Bayesian estimator under standard losses;
  2. the optimal decision under an arbitrary utility function, decision that minimises the expected posterior loss;
  3. a 90% or 95% range of uncertainty on the parameter(s), a sub-vector of the parameter(s), or a function of the parameter(s), aka HPD region$$\{h=\mathfrak{h}(\theta);\ \pi^\mathfrak{h}(h)\ge \underline{h}\}$$
  4. the most likely model to choose between setting some components of the parameter(s) to specific values versus keeping them unknown (and random).

These are only examples of many usages of the posterior distribution. In all cases but the most simple ones, I cannot provide the answers by staring at the posterior distribution density and do need to proceed through numerical resolutions like Monte Carlo and Markov chain Monte Carlo methods.

Related Question