Solved – Acceptance rates for Metropolis-Hastings with uniform candidate distribution

bayesianestimationmarkov-chain-montecarlosampling

When running the Metropolis-Hastings algorithm with uniform candidate distributions, what is the rationale of having acceptance rates around 20%?

My thinking is: once the true (or close to true) parameter values are discovered, then no new set of candidate parameter values from the same uniform interval would increase the value of the likelihood function. Therefore, the more iterations I run, the lower the acceptance rates I should get.

Where am I wrong in this thinking? Many thanks!

Here is the illustration of my calculations:

$$Acceptance\_rate =
\exp \{l(\theta_c|y) + \log(p(\theta_c)) – [l(\theta^*|y) + \log(p(\theta^*) ]\},$$

where $l$ is the log-likelihood.

As $\theta$ candidates are always taken from the same uniform interval,

$$p(\theta_c) = p(\theta^*).$$

Therefore acceptance rate calculation shrinks down to:

$$Acceptance\_rate = \exp \{l(\theta_c | y) – [l(\theta^* | y) ]\}$$

The acceptance rule of $\theta_c$ is then as follows:

If $U \le Acceptance\_rate $, where $U$ is draw from uniform distribution in interval $[0,1]$, then

$$\theta^* = \theta_c,$$

else draw $\theta_c$ from uniform distribution in interval $[\theta_{min}, \theta_{max}]$

Best Answer

I believe that Weak convergence and optimal scaling of random walk Metropolis algorithms by Roberts, Gelman and Gilks is the source for the 0.234 optimal acceptance rate.

What the paper shows is that, under certain assumptions, you can scale the random walk Metropolis-Hastings algorithm as the dimension of the space goes to infinity to get a limiting diffusion for each coordinate. In the limit, the diffusion can be seen as "most efficient" if the acceptance rate takes the value 0.234. Intuitively, it is a tradeoff between making to many small accepted steps and making to many large proposals that get rejected.

The Metropolis-Hastings algorithm is not really an optimization algorithm, in contrast to simulated annealing. It is an algorithm that is supposed to simulate from the target distribution, hence the acceptance probability should not be driven towards 0.