MCMC Convergence – Best Methods for Checking Convergence in MCMC

bayesianmarkov-chain-montecarlo

What is your preferred method of checking for convergence when using Markov chain Monte Carlo for Bayesian inference, and why?

Best Answer

I use the Gelman-Rubin convergence diagnostic as well. A potential problem with Gelman-Rubin is that it may mis-diagnose convergence if the shrink factor happens to be close to 1 by chance, in which case you can use a Gelman-Rubin-Brooks plot. See the "General Methods for Monitoring Convergence of Iterative Simulations" paper for details. This is supported in the coda package in R (for "Output analysis and diagnostics for Markov Chain Monte Carlo simulations"). coda also includes other functions (such as the Geweke’s convergence diagnostic).

You can also have a look at "boa: An R Package for MCMC Output Convergence Assessment and Posterior Inference".

Related Solutions

Trace Plot for MCMC Results – Importance and Use

You create the parameter trace plots to make sure that your a priori distribution is well calibrated which is indicated by your parameters having sufficient state changes as the MCMC algorithm runs.

An extreme example is that you set your a priori distribution variance at 0. Then the posterior parameter estimate will never change. Your algorithm would say that you have the best parameter estimate, but it didn't check a sufficient number of parameters to determine if this truly is the best fit. If you set the a priori distribution variance too high, you get a similar problem. This is because the new parameter is less likely to be related to your data - so the log likelihood calculated with your new parameter is not likely to be better than the log likelihood using the old parameter. (An example is if your "true" parameter is 0.5 and your initial estimate is 2, but you are selecting from a normal distribution with a mean of 2 and a variance of 10,000 then you are unlikely to get a parameter that is closer to 1.5 than your initial estimate of 2.)

You need to select an a priori variance that allows your parameter states to change enough that you don't get stuck on local minimums and maximums in the loglikelihood distribution, but yet fine enough that you get reasonable parameter estimates. Most literature suggests you get your parameters to change states 40-60% of the time.

One other reason for the trace plots is burn in. Usually the burn in period is obvious in the plot (for example, if the true parameter is 1.5 and your initial estimate is 4 then you should see the parameter estimates moving quickly from 4 to 1.5 and then "bouncing" around 1.5). Typically, you just exclude the first n iterations where n is large enough that you are certain to have removed the burn in (say 1000), but if the calculations are time consuming or if your parameter estimates are taking much longer to converge than your n allows then you may want to omit more or less observations to account for burn in. You can check your plots to see where the burn in period ends to make sure that burn in is not affecting your results.

Note that I have been talking in context of parameter point estimates. If you are estimating parameter variance then ensuring that you have appropriate state changes is even more important.

Solved – Is Markov chain based sampling the “best” for Monte Carlo sampling? Are there alternative schemes available

There is no reason to state that MCMC sampling is the "best" Monte Carlo method! Usually, it is on the opposite worse than iid sampling, at least in terms of variance of the resulting Monte Carlo estimators$$\frac{1}{T}\sum_{t=1}^T h(X_t)$$Indeed, while this average converges to the expectation $\mathbb{E}_{\pi}[h(X)]$ when $\pi$ is the stationary and limiting distribution of the Markov chain $(X_t)_t$, there are at least two drawbacks in using MCMC methods:

The chain needs to "reach stationarity", meaning that it needs to forget about its starting value $X_0$. In other words, $t$ must be "large enough" for $X_t$ to be distributed from $\pi$. Sometimes "large enough" may exceed by several orders of magnitude the computing budget for the experiment.
The values $X_t$ are correlated, leading to an asymptotic variance that involves $$\text{var}_\pi(X)+2\sum_{t=1}^\infty\text{cov}_\pi(X_0,X_t)$$ which generally exceeds $\text{var}_\pi(X)$ and hence requires longer simulations than for an iid sample.

This being said, MCMC is very useful for handling settings where regular iid sampling is impossible or too costly and where importance sampling is quite difficult to calibrate, in particular because of the dimension of the random variable to be simulated.

However, sequential Monte Carlo methods like particle filters may be more appropriate in dynamical models, where the data comes by bursts that need immediate attention and may even vanish (i.e., cannot be stored) after a short while.

In conclusion, MCMC is a very useful (and very much used) tool to handle complex settings where regular Monte Carlo solutions fail.

Best Answer

Related Solutions

Trace Plot for MCMC Results – Importance and Use

Solved – Is Markov chain based sampling the “best” for Monte Carlo sampling? Are there alternative schemes available

Related Question