Markov Chain – Dynamically Adjusting Parameters of Markov Chain

autocorrelationmarkov-chain-montecarlomarkov-processmetropolis-hastings

I am using a Metropolis algorithm to generate samples from a complicated (high-dimensional) probability distribution. As is common, the proposed updates depend on some "step size" parameter $\epsilon$, such that

  • too small $\epsilon$ leads to very small updates and thus long auto-correlations
  • too large $\epsilon$ leads to small acceptance-rates and thus long auto-correlations.

Common wisdom is that one should choose $\epsilon$ such that the overall acceptance probability is somewhere in the 50%-90% range in order to optimize for autocorrelation times. Usually this tuning is done by hand, i.e. using a couple of trial runs.

No the question: Is it valid to adjust $\epsilon$ dynamically during the Markov process? By "valid" I mean, is the stationary distribution of the process still the same?

For example I could track the acceptance-rate so far, and:

  • If the rate is $<0.9$, decrease $\epsilon$ a little
  • If the rate is $>0.9$, increase $\epsilon$ a little

I could do this automatic adjustment after every step. I dont see why this would be wrong (after all, the stationary distribution does not depend on $\epsilon$). But it seems very fishy to me to make the simulation parameters dependent on the simulation results.

Could someone help me to wrap my head around this?

Best Answer

First, the acceptance rate for random walk Metropolis-Hastings and MALA should be lower than 50%. For instance, to quote from an answer to an earlier X validated question,

I believe that Weak convergence and optimal scaling of random walk Metropolis algorithms by Roberts, Gelman and Gilks is the source for the 0.234 optimal acceptance rate.

What the paper shows is that, under certain assumptions, you can scale the random walk Metropolis-Hastings algorithm as the dimension of the space goes to infinity to get a limiting diffusion for each coordinate. In the limit, the diffusion can be seen as "most efficient" if the acceptance rate takes the value 0.234. Intuitively, it is a tradeoff between making to many small accepted steps and making to many large proposals that get rejected.

Second, to keep tuning $\varepsilon$ according to earlier runs invalidates the Markov justification of MCMC algorithms unless the tuning is controlled, for instance under diminishing adaptation. Entries on X Validated with the keywords adaptive and metropolis provide some references on the topic.

Related Question