Bayesian Simulation – When to Use Gibbs Sampling Instead of Metropolis-Hastings?

bayesiangibbsmarkov-chain-montecarlometropolis-hastingssimulation

There are different kinds of MCMC algorithms:

  • Metropolis-Hastings
  • Gibbs
  • Importance/rejection sampling (related).

Why would one use Gibbs sampling instead of Metropolis-Hastings? I suspect there are cases when inference is more tractable with Gibbs sampling than with Metropolis-Hastings, but I am not clear on the specifics.

Best Answer

Firstly, let me note [somewhat pedantically] that

There are several different kinds of MCMC algorithms: Metropolis-Hastings, Gibbs, importance/rejection sampling (related).

importance and rejection sampling methods are not MCMC algorithms because they are not based on Markov chains. Actually, importance sampling does not produce a sample from the target distribution, $f$ say, but only importance weights $\omega$ say, to be used in Monte Carlo approximations of integrals related with $f$. Using those weights as probabilities to produce a sample does not lead to a proper sample from $f$, even though unbiased estimators of expectations under $f$ can be produced.

Secondly, the question

Why would someone go with Gibbs sampling instead of Metropolis-Hastings? I suspect there are cases when inference is more tractable with Gibbs sampling than with Metropolis-Hastings

does not have an answer in that a Metropolis-Hastings sampler can be almost anything, including a Gibbs sampler. I replied in rather detailed terms to an earlier and similar question. But let me add a few if redundant points here:

The primary reason why Gibbs sampling was introduced was to break the curse of dimensionality (which impacts both rejection and importance sampling) by producing a sequence of low dimension simulations that still converge to the right target. Even though the dimension of the target impacts the speed of convergence. Metropolis-Hastings samplers are designed to create a Markov chain (like Gibbs sampling) based on a proposal (like importance and rejection sampling) by correcting for the wrong density through an acceptance-rejection step. But an important point is that they are not opposed: namely, Gibbs sampling may require Metropolis-Hastings steps when facing complex if low-dimension conditional targets, while Metropolis-Hastings proposals may be built on approximations to (Gibbs) full conditionals. In a formal definition, Gibbs sampling is a special case of Metropolis-Hasting algorithm with a probability of acceptance of one. (By the way, I object to the use of inference in that quote, as I would reserve it for statistical purposes, while those samplers are numerical devices.)

Usually, Gibbs sampling [understood as running a sequence of low-dimensional conditional simulations] is favoured in settings where the decomposition into such conditionals is easy to implement and fast to run. In settings where such decompositions induce multimodality and hence a difficulty to move between modes (latent variable models like mixture models come to mind), using a more global proposal in a Metropolis-Hasting algorithm may produce a higher efficiency. But the drawback stands with choosing the proposal distribution in the Metropolis-Hasting algorithm.