Solved – What if Markov chain does not converge in a reasonable amount of time

convergencemarkov-processmonte carlo

I'm doing data analysis using Hamiltonian Monte Carlo for sampling from the posterior distribution of weights of a neural network. I'm using the Gelman-Rubin diagnostic estimated potential scale reduction (ESPR) for checking the convergence of my Markov chains. My neural network has around 317 model weights and I check the convergence of each of the 317 parameters separately.

If I have understood everything correctly the parameters should have converged if the ESPR value for each of them is < 1.1.

This indeed does happen in most of the parameters but some weights seem not to converge in a reasonable amount of time. Some take up to 100.000 or more samples until they converge, which takes too long time in my analysis.

My question is: "What is the appropriate way to proceed if the Markov chains do not converge in a reasonable amount of time? Do I just need to bite the bullet and wait for three months or so?"

Best Answer

To answer your original question

  1. I am not a huge fan of using Gelman-Rubin mainly because it is somewhat handwavy for my taste. However, if you still want to use it, maybe try a multivariate Gelman-Rubin since it is possible the joint posterior of the weights have a complicated dependence structure that the univariate diagnostic is not able to capture. See answer here.
  2. I would suggest first looking at a trace plot for the weights that are slowly converging. Maybe it is a problem of multimodality etc than Gelman-Rubin is not able to catch.
  3. HMC is known to convergence fairly quickly usually in many situations. Maybe focus on the quality of the estimates obtained by analysing the variance in the estimates. You can find a discussion of the methods here.
  4. To actually improve converge of the chain, you can try different starting values for the slow converging chains. You can also may be tweak the HMC wherever possible. It is also possible that HMC just doesn't work here, and a variant of the Metropolis-Hastings algorithm might work better. I won't be able to say anything without knowing more about the problem.