Solved – Why do we want low autocorrelation for MCMC convergence

autocorrelationbayesmarkov-chain-montecarlo

Usually, autocorrelation is one diagnostical tool for judging the convergence of a MCMC trail. Low autocorrelation is desired as this would mean that the parameter space is well explored.

I have a real struggle with this. Assume that we have a lot of data and the highest density interval of the posterior is very narrow. Thus, most of the density would fall onto a small range of parameters. If we derive this from a MCMC chain, this would mean that the parameters of the steps of the chain would not differ much. From my understanding, this would also imply that the autocorrelation is high.

What am I misunderstanding here?

Best Answer

There's two things going on here that both lead to wanting to low autocorrelation, but from slightly different angles.

First is that if you have some sort of sampler (i.e. Metropolis Hasting, Gibbs sampler, whatever), you would like to have very little autocorrelation in the samples. This can be explained very easily by thinking of the MCMC error: for the posterior mean, for example, the MCMC error will be lower if you have weakly correlated samples than if you have strongly correlated samples. That's the easy explanation, but it's worth thinking about more too. In general, it is easy to see that a sampler that produces weakly correlated samples is preferred over one that produces strongly correlated samples.

Second, which I think is more what you are getting, is using the empirical autocorrelation to determine what "burn-in" to remove. The reasoning behind this is that typically, we don't have great starting points; we often begin far from the mode. Including these starting points in a finite sample can add bias, as we are over representing these start points by beginning there and potentially slowly drifting toward the mode. Different samplers may "drift" faster than others. Once we get close to the mode, we should essentially bouncing around the mode at that point. But here in is the important note: when we are far from the mode, many samplers will have strong 'tug' back toward the mode, as moving toward the mode will likely greatly increase the posterior probability. However, when we are close to the mode, our sampler should be taking random jumps, with much less 'tug' toward the mode (because the samples are much closer). The stronger the tug, the higher the autocorrelation. Thus, if we observe that there is heavy autocorrelation early in our chain, which then levels off after awhile, this is indicative of being very far from the mode early on.