Solved – Posterior autocorrelation in Pymc. How to interpret it

bayesianprobabilistic-programmingpymc

I started learning Bayesian inference by reading "Probabilistic Programming and Bayesian Methods for Hackers". I found something that is not really clear for me in the third chapter. Lets look at the small part from the original text.

A chain that is [Isn't meandering exploring?] exploring the space well will exhibit very high autocorrelation. Visually, if the trace seems to meander like a river, and not settle down, the chain will have high autocorrelation.
This does not imply that a converged MCMC has low autocorrelation. Hence low autocorrelation is not necessary for convergence, but it is sufficient. PyMC has a built-in autocorrelation plotting function in the Matplot module.

If autocorrelation is low, it's sufficient condition for good convergence. But if autocorrelation is high, what can we say about the convergence?

If autocorrelation is low, does it mean that parameter's space was explored bad? How can we improve the sampler to converge and to explore the full parameters space at the same time?

Best Answer

Autocorrelation dictates the amount of time you have to wait for convergence. If autocorrelation is high, you will have to use a longer burn-in, and you will have to draw more samples after the burn-in to get a good estimate of the posterior distribution.
Low autocorrelation means good exploration. Exploration and convergence are essentially the same thing. If a sampler explores well, then it will converge fast, and vice versa. Part of the confusion here is that the book does not explain the concept of convergence very well. You can find a much better definition of convergence on Wikipedia.

Related Solutions

Solved – 2-Gaussian mixture model inference with MCMC and PyMC

The problem is caused by the way that PyMC draws samples for this model. As explained in section 5.8.1 of the PyMC documentation, all elements of an array variable are updated together. For small arrays like center this is not a problem, but for a large array like category it leads to a low acceptance rate. You can see the acceptance rate via

print mcmc.step_method_dict[category][0].ratio

The solution suggested in the documentation is to use an array of scalar-valued variables. In addition, you need to configure some of the proposal distributions since the default choices are bad. Here is the code that works for me:

import pymc as pm
sigmas = pm.Normal('sigmas', mu=0.1, tau=1000, size=2)
centers = pm.Normal('centers', [0.3, 0.7], [1/(0.1)**2, 1/(0.1)**2], size=2)
alpha  = pm.Beta('alpha', alpha=2, beta=3)
category = pm.Container([pm.Categorical("category%i" % i, [alpha, 1 - alpha]) 
                         for i in range(nsamples)])
observations = pm.Container([pm.Normal('samples_model%i' % i, 
                   mu=centers[category[i]], tau=1/(sigmas[category[i]]**2), 
                   value=samples[i], observed=True) for i in range(nsamples)])
model = pm.Model([observations, category, alpha, sigmas, centers])
mcmc = pm.MCMC(model)
# initialize in a good place to reduce the number of steps required
centers.value = [mu1_true, mu2_true]
# set a custom proposal for centers, since the default is bad
mcmc.use_step_method(pm.Metropolis, centers, proposal_sd=sig1_true/np.sqrt(nsamples))
# set a custom proposal for category, since the default is bad
for i in range(nsamples):
    mcmc.use_step_method(pm.DiscreteMetropolis, category[i], proposal_distribution='Prior')
mcmc.sample(100)  # beware sampling takes much longer now
# check the acceptance rates
print mcmc.step_method_dict[category[0]][0].ratio
print mcmc.step_method_dict[centers][0].ratio
print mcmc.step_method_dict[alpha][0].ratio

The proposal_sd and proposal_distribution options are explained in section 5.7.1. For the centers, I set the proposal to roughly match the standard deviation of the posterior, which is much smaller than the default due to the amount of data. PyMC does attempt to tune the width of the proposal, but this only works if your acceptance rate is sufficiently high to begin with. For category, the default proposal_distribution = 'Poisson' which gives poor results (I don't know why this is, but it certainly doesn't sound like a sensible proposal for a binary variable).

Solved – causing autocorrelation in MCMC sampler

When using Markov chain Monte Carlo (MCMC) algorithms in Bayesian analysis, often the goal is to sample from the posterior distribution. We resort to MCMC when other independent sampling techniques are not possible (like rejection sampling). The problem however with MCMC is that the resulting samples are correlated. This is because each subsequent sample is drawn by using the current sample.

There are two main MCMC sampling methods: Gibbs sampling and Metropolis-Hastings (MH) algorithm.

Autocorrelation in the samples is affected by a lot of things. For example, when using MH algorithms, to some extent you can reduce or increase your autocorrelations by adjusting the step size of proposal distribution. In Gibbs sampling however, there is no such adjustment possible. The autocorrelation is also affected by starting values of the Markov chain. There is generally an (unknown) optimum starting value that leads to the comparatively less autocorrelation. Multi-modality of the target distribution can also greatly affect the autocorrelation of the samples. Thus there are attributes of the target distribution that can definitely dictate autocorrelation. But most often autocorrelation is dictated by the sampler used. Broadly speaking if an MCMC sampler jumps around the state space more, it is probably going to have smaller autocorrelation. Thus, it is often desirable to choose samplers that make large accepted moves (like Hamiltonian Monte Carlo).
I am unfamiliar with JAGS.
If you have decided on the sampler already, and do not have the option of playing around with other samplers, then the best bet would be to do some preliminary analysis to find good starting values and step sizes. Thinning is not generally suggested since it is argued that throwing away samples is less efficient than using correlated samples. A universal solution is to run the sampler for a long time, so that you Effective Sample Size (ESS) is large. Look at the R package mcmcse here. If you look at the vignette on Page 8, the author proposes a calculation of the minimum effective samples one would need for their estimation process. You can find that number for your problem, and let the Markov chain run until you have that many effective samples.

Best Answer

Related Solutions

Solved – 2-Gaussian mixture model inference with MCMC and PyMC

Solved – causing autocorrelation in MCMC sampler

Related Question