Solved – Can Machine Learning or Deep Learning algorithms be utilised to “improve” the sampling process of a MCMC technique

machine learningmarkov-chain-montecarlomarkov-processmonte carlo

Based on the little knowledge that I have on MCMC (Markov chain Monte Carlo) methods, I understand that sampling is a crucial part of the aforementioned technique. The most commonly used sampling methods are Hamiltonian and Metropolis.

Is there a way to utilise machine learning or even deep learning to construct a more efficient MCMC sampler?

Best Answer

Yes. Unlike what other answers state, 'typical' machine-learning methods such as nonparametrics and (deep) neural networks can help create better MCMC samplers.

The goal of MCMC is to draw samples from an (unnormalized) target distribution $f(x)$. The obtained samples are used to approximate $f$ and mostly allow to compute expectations of functions under $f$ (i.e., high-dimensional integrals) and, in particular, properties of $f$ (such as moments).

Sampling usually requires a large number of evaluations of $f$, and possibly of its gradient, for methods such as Hamiltonian Monte Carlo (HMC). If $f$ is costly to evaluate, or the gradient is unavailable, it is sometimes possible to build a less expensive surrogate function that can help guide the sampling and is evaluated in place of $f$ (in a way that still preserves the properties of MCMC).

For example, a seminal paper (Rasmussen 2003) proposes to use Gaussian Processes (a nonparametric function approximation) to build an approximation to $\log f$ and perform HMC on the surrogate function, with only the acceptance/rejection step of HMC based on $f$. This reduces the number of evaluation of the original $f$, and allows to perform MCMC on pdfs that would otherwise too expensive to evaluate.

The idea of using surrogates to speed up MCMC has been explored a lot in the past few years, essentially by trying different ways to build the surrogate function and combine it efficiently/adaptively with different MCMC methods (and in a way that preserves the 'correctness' of MCMC sampling). Related to your question, these two very recent papers use advanced machine learning techniques -- random networks (Zhang et al. 2015) or adaptively learnt exponential kernel functions (Strathmann et al. 2015) -- to build the surrogate function.

HMC is not the only form of MCMC that can benefit from surrogates. For example, Nishiara et al. (2014) build an approximation of the target density by fitting a multivariate Student's $t$ distribution to the multi-chain state of an ensemble sampler, and use this to perform a generalized form of elliptical slice sampling.

These are only examples. In general, a number of distinct ML techniques (mostly in the area of function approximation and density estimation) can be used to extract information that might improve the efficiency of MCMC samplers. Their actual usefulness -- e.g. measured in number of "effective independent samples per second" -- is conditional on $f$ being expensive or somewhat hard to compute; also, many of these methods may require tuning of their own or additional knowledge, restricting their applicability.

References:

  1. Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo for expensive Bayesian integrals." Bayesian Statistics 7. 2003.

  2. Zhang, Cheng, Babak Shahbaba, and Hongkai Zhao. "Hamiltonian Monte Carlo Acceleration using Surrogate Functions with Random Bases." arXiv preprint arXiv:1506.05555 (2015).

  3. Strathmann, Heiko, et al. "Gradient-free Hamiltonian Monte Carlo with efficient kernel exponential families." Advances in Neural Information Processing Systems. 2015.

  4. Nishihara, Robert, Iain Murray, and Ryan P. Adams. "Parallel MCMC with generalized elliptical slice sampling." Journal of Machine Learning Research 15.1 (2014): 2087-2112.