We don't use MCMC to calculate the $p(\theta | y)$ for each value (or many values) of $\theta$. What MCMC (or the special case of Gibbs sampling) does is generate a (large) random sample from $p(\theta | y)$. Note that $p(\theta | y)$ is not being calculated; you have to do something with that vector (or matrix) of random numbers to estimate $p(\theta)$. Since you're not calculating $p(\theta)$ for lots of values of $\theta$, you don't need a Gibbs (or MCMC) loop inside a $\theta$ loop - just one (long) Gibbs (or MCMC) loop.
EDIT in response to an update to the question: We do not need to integrate the distribution to get the constant of integration (CoI)! The whole value of MCMC is is found in situations where we can't calculate the CoI. Using MCMC, we can still generate random numbers from the distribution. If we could calculate the CoI, we could just calculate the probabilities directly, without the need to resort to simulation.
Once again, we are NOT calculating $p(\theta|y)$ using MCMC, we are generating random numbers from $p(\theta|y)$ using MCMC. A very different thing.
Here's an example from a simple case: the posterior distribution for the scale parameter from an Exponential distribution with a uniform prior. The data is in x
, and we generate N <- 10000
samples from the posterior distribution. Observe that we are only calculating $p(x|\theta)$ in the program.
x <- rexp(100)
N <- 10000
theta <- rep(0,N)
theta[1] <- cur_theta <- 1 # Starting value
for (i in 1:N) {
prop_theta <- runif(1,0,5) # "Independence" sampler
alpha <- exp(sum(dexp(x,prop_theta,log=TRUE)) - sum(dexp(x,cur_theta,log=TRUE)))
if (runif(1) < alpha) cur_theta <- prop_theta
theta[i] <- cur_theta
}
hist(theta)
And the histogram:
Note that the logic is simplified by our choice of sampler (the prop_theta
line), as a couple of other terms in the next line (alpha <- ...
) cancel out, so don't need to be calculated at all. It's also simplified by our choice of a uniform prior. Obviously we can improve this code a lot, but this is for expository rather than functional purposes.
Here's a link to a question with several answers giving sources for learning more about MCMC.
Best Answer
This question has likely been considered already on this forum.
When you state that you "have the posterior distribution", what exactly do you mean? "Having" a function of $\theta$ that I know is proportional to the posterior, namely$$\pi(\theta|x) \propto \pi(\theta) \times f(x|\theta)$$for instance the completely artificial target$$\pi(\theta|x)\propto\exp\{-||\theta-x||^2-||\theta+x||^4-||\theta-2x||^6\},\ \ x,\theta\in\mathbb{R}^{18},$$does not tell me what is
These are only examples of many usages of the posterior distribution. In all cases but the most simple ones, I cannot provide the answers by staring at the posterior distribution density and do need to proceed through numerical resolutions like Monte Carlo and Markov chain Monte Carlo methods.