[Math] Does MCMC overcome the curse of dimensionality

computational complexitymarkov chainsreference-requestst.statistics

I want to compute an integral like this

$$\frac{\int_y g(y) e^{-\beta f(y)} \text{d} y } {\int_y e^{-\beta f(y)} \text{d} y}$$

where $f(y)$ is not necessarily convex and the dimension $d$ of $y$ is large.

This problem can be viewed as to integrate a function with respect to a density function whose normalized factor is unknown. It seems MCMC is a good choice? However for this general problem I didn't find any literature showing the convergence rate w.r.t the dimension $d$.

Remark: there are two ways I know to implement MCMC algorithms. One is by Metropolis–Hastings algorithm; another is by Metropolis-adjusted Langevin algorithm, which simulates a SDE $dX(t)=-\nabla f(x(t))dt + \sqrt{2\beta^{-1}} dB(t) $. In the latter case we also need to consider discretization errors of the SDE.

I guess without convexity assumptions on $f(y)$ the convergence rate could be $\mathcal{O}(e^{-d})$ slow. If this is true, it may suffer from the curse of dimensionality?

The reason why I came up with this problem is that we can just use simple Monte Carlo method, i.e. just sample uniformly distributed random variable to compute the integral on numerator and denominator separately. And as we know this time the variance of our estimate is not related to $d$.

I am very confused about the role MCMC plays in high dimension problems. Does someone there can help me figure out this? Any references are much appreciated.

Best Answer

You need a global convexity to enjoy the optimal convergence rate, otherwise even local convexity will almost surely(not in probabilistic sense) lead to the worst rate you pointed out.

MCMC(Markov Chain Monte Carlo) does not overcome the curse of dimensionality. Quite the contrary, Bayesians are working very hard in two directions to solve the problems that caused by the high dimension of the parameter space.

(1) The ABC(Approximate Bayesian Computation) scheme. This originated from Laplace's approximation of an exponential integral and its main idea is to discard those samples from each MH step that is "dissimilar from existing samples". But this method suffers from failure of detecting outliers and over-concentration of the posterior.

On the other hand even if this method works well during sampling, we do not have any consistency guarantee since when nuisance parameters $\theta_2$ consist of the majority of the parameter space $\boldsymbol{\theta}=(\theta_1,\theta_2)$, Neyman-Scott example will nullify the information about the parameter $\theta_1$ of concern brought in by the samples. That is also a reason why regularization methods are so popular in high dimension inference problems, they simply produce a weighted norm that cannot be washed out by samples.

(2) The scalable methods(For example). These are sometimes referred to as divide-and-conquer problems. The Bayesian model, or more precisely the parameter space is decomposed into subspaces, and the high dimension issue is divided into many low dimension issues and the $O(n^d)$ problem becomes $O(n^{d/k}(n/k))$ problems. But the problem is also obvious, that is decomposing and combining parameter spaces will artificially delete and add correlations between parameter spaces, which makes the prior information about $\boldsymbol \theta$ lose.

Related Question