Solved – How to sample using MCMC from a posterior distribution in general

bayesianmarkov-chain-montecarlometropolis-hastingssamplingsimulation

Assume one has the posterior distribution of a parameter, $p(\theta|y)$ and what I mean by having it is that for each point of $\theta$, one can use Monte Carlo method+MCMC to calculate the $p(\theta|y)$. Now my question is if I want to sample from $p(\theta|y)$, them basically I have to do one Gibbs sampling(for example) to sample from distribution and at any point I have to run Monte Carlo method on the point to calculate $p(\theta|y)$'s value right? i.e. it needs two loops, one inside of the other. Is this correct?

As I got an answer to this question and I thought maybe my question is vague I will try to clarify it a bit more:

From what I know by reading for a week the whole time about Monte Carlo method and MCMC, I understood(correct me if I am wrong) that:
$$p(\theta|y)=\frac{p(y|\theta)p(\theta)}{\int_{\Theta}{p(y|\theta)p(\theta)}\text{d}\theta}.$$

Now if you consider that we only have a sampling algorithm for $\theta$ and we can only calculate $p(y|\theta)$ explicitly(and not the other functions!), therefore to get values from $p(\theta|y)$ one needs to numerically integrate the denominator. And for each value of this posterior one needs to apply a sampling scheme like Gibbs sampling to generate a sample of $p(\theta|y)$; each new transition in the parameter space should then sample from the distribution which is $p(\theta|y)$ here and to calculate that the above proportion should be computed.

Best Answer

We don't use MCMC to calculate the $p(\theta | y)$ for each value (or many values) of $\theta$. What MCMC (or the special case of Gibbs sampling) does is generate a (large) random sample from $p(\theta | y)$. Note that $p(\theta | y)$ is not being calculated; you have to do something with that vector (or matrix) of random numbers to estimate $p(\theta)$. Since you're not calculating $p(\theta)$ for lots of values of $\theta$, you don't need a Gibbs (or MCMC) loop inside a $\theta$ loop - just one (long) Gibbs (or MCMC) loop.

EDIT in response to an update to the question: We do not need to integrate the distribution to get the constant of integration (CoI)! The whole value of MCMC is is found in situations where we can't calculate the CoI. Using MCMC, we can still generate random numbers from the distribution. If we could calculate the CoI, we could just calculate the probabilities directly, without the need to resort to simulation.

Once again, we are NOT calculating $p(\theta|y)$ using MCMC, we are generating random numbers from $p(\theta|y)$ using MCMC. A very different thing.

Here's an example from a simple case: the posterior distribution for the scale parameter from an Exponential distribution with a uniform prior. The data is in x, and we generate N <- 10000 samples from the posterior distribution. Observe that we are only calculating $p(x|\theta)$ in the program.

x <- rexp(100)

N <- 10000
theta <- rep(0,N)
theta[1] <- cur_theta <- 1  # Starting value
for (i in 1:N) {
   prop_theta <- runif(1,0,5)  # "Independence" sampler
   alpha <- exp(sum(dexp(x,prop_theta,log=TRUE)) - sum(dexp(x,cur_theta,log=TRUE)))
   if (runif(1) < alpha) cur_theta <- prop_theta
   theta[i] <- cur_theta
}

hist(theta)

And the histogram:

$Posterior distribution of $\theta$$

Note that the logic is simplified by our choice of sampler (the prop_theta line), as a couple of other terms in the next line (alpha <- ...) cancel out, so don't need to be calculated at all. It's also simplified by our choice of a uniform prior. Obviously we can improve this code a lot, but this is for expository rather than functional purposes.

Here's a link to a question with several answers giving sources for learning more about MCMC.

Best Answer

Related Solutions

Solved – Gibbs sampling versus general MH-MCMC

Solved – How do ABC and MCMC differ in their applications

Related Question