[Math] Why does Monte-Carlo integration work better than naive numerical integration in high dimensions

integrationmonte carlonumerical methods

Can anyone explain simply why Monte-Carlo works better than naive Riemann integration in high dimensions? I do not understand how chosing randomly the points on which you evaluate the function can yield a more precise result than distributing these points evenly on the domain.

More precisely:

Let $f:[0,1]^d \to \mathbb{R}$ be a continuous bounded integrable function, with $d\geq3$. I want to compute $A=\int_{[0,1]^d} f(x)dx$ using $n$ points. Compare 2 simple methods.

The first method is the Riemann approach. Let $x_1, \dots, x_n$ be $n$ regularly spaced points in $[0,1]^d$ and $A_r=\frac{1}{n}\sum_{i=1}^n f(x_i)$. I have that $A_r \to A$ as $n\to\infty$. The error will be of order $O(\frac{1}{n^{1/d}})$.

The second method is the Monte-Carlo approach. Let $u_1, \dots, u_n$ be $n$ points chosen randomly but uniformly over $[0,1]^d$. Let $A_{mc}=\frac{1}{n}\sum_{i=1}^n f(u_i)$. The central limit theorem tells me that $A_{mc} \to A$ as $n\to \infty$ and that $A_{mc}-A$ will be in the limit a gaussian random variable centered on $0$ with variance $O(\frac{1}{n})$. So with a high probability the error will be smaller than $\frac{C}{\sqrt{n}}$ where $C$ does not depend (much?) on $d$.

An obvious problem with the Riemann approach is that if I want to increase the number of points while keeping a regular grid I have to go from $n=k^d$ to $n=(k+1)^d$ which adds a lots of points. I do not have this problem with Monte-Carlo.

But if the number of points is fixed at $n$, does Monte-Carlo really yield better results than Riemann? It seems true in most cases. But I do not understand how chosing the points randomly can be better. Does anybody have an intuitive explanation for this?

Best Answer

I don't think it's fair to compare Riemann Integration to Monte Carlo. First off, using the usual Riemann sums for integration is a terrible way of computing integrals, especially those that oscillate wildly or are just plain naughty in certain areas. Just look at $d=1$. Riemann integration is $O(1/n)$ whereas the Trapezoid rule gives $O(1/n^2)$, assuming your function is three times differentiable. So for starters, instead of Riemann integration one has a plethora of numerical quadrature methods availabe. There are dozens of such methods and some adaptive methods try to concentrate the quadrature sample points in areas where the function behaves badly. Something like the trapezoid rule is a typical non-adaptive way of obtaining better accuracy by simply evaluating at midpoints.

However, as Hagen pointed out you have to deal with the curse of dimensionality. Quadrature methods start failing very badly in higher dimensions because you need a lot of points. So you need to be smart about it. You need to focus on areas where the function has large integral. Something like VEGAS Monte Carlo takes advantage of this by sampling near areas of large function values. By doing so, you are hedging your quadriture points where the function behaves the worst.

The simple Monte Carlo integration algorithm just samples random points. If your function isn't too crazy, this will tend to do slightly better than just a fixed size grid, especially on functions that vary (but not too much!). As a thought experiment, imagine trying to integrate something very sharply peaked and zero eleswhere. Then Monte Carlo will do a terrible job because there are only a few points where the function is nonzero, whereas Riemann integration may be slightly better, especially if the width of the peak is a bit bigger than the grid size. On the other end of the spectrum, consider a function whose derivative goes from small to large in small bursts. Then Riemann integration will be pretty terrible but Monte Carlo will tend to average out better.

Related Question