Solved – Proposal distribution – Metropolis Hastings MCMC

markov-chain-montecarlometropolis-hastingsmonte carlosampling

In Metropolis-Hastings Markov chain Monte Carlo, the proposal distribution can be anything including the Gaussian (according to the Wikipedia).

Q: What's the motivation for using anything other than Gaussian? Gaussian works, it's easy to evaluate, it's fast and everybody understands it. Why would I consider anything else?

Q: Since the proposal distribution can be anything, can I use an uniform distribution?

Best Answer

A1: Indeed the Gaussian distribution is probably the most used proposal distribution primarily due to ease of use. However, one might want to use other proposal distributions for the following reason

  1. Heavy Tails: The Gaussian distribution has light tails. This means that $N(x_{t-1}, \sigma^2)$ will possibly only suggest values between $(x_{t-1} - 3\sigma, x_{t-1} + 3\sigma)$. But a $t$ distribution has heavier tails, and thus can propose values which are farther away. This ensures that the resulting Markov chain explores the state space more freely, and possibly reduces autocorrelation. The plot below shows the $N(0,1)$ compared to the $t_1$. You see how the $t$ will likely propose more values farther from 0.

enter image description here

  1. Restricted Space: The Gaussian distribution is defined on all reals. If the distribution you are sampling from is lets say only defined on the positives or on $(0,1)$, then the Gaussian will likely propose values for which the the target density is 0. Such values are then immediately rejected, and the Markov chain does not move from its current spot. This is essentially wasting a draw from the Markov chain. Instead, if you are on the positives, you could use a Gamma distribution and on $(0,1)$ you could use a Beta.
  2. Multiple Modes: When the target distribution is multi-modal, a Gaussian proposal will likely lead to the Markov chain getting stuck near one mode. This is in part due to the light tails of the Gaussian. Thus, instead, people use gradient based proposals, or a mixture of Gaussians as a proposal.

You can find more discussion here and here.

A2: Yes you can use a Uniform distribution as long as the support for the uniform distribution is bounded (since if the support is unbounded the Uniform distribution is improper as it integrates to $\infty$). So a Uniform on $(x_{t-1} - c, x_{t-1} + c)$.