Is integration in high dimensions hard

integrationmonte carlonumerical methods

Consider the problem of estimating the integral $\int_{[0,1]^d} {\rm d}^dx f(x)$ where $f : [0,1]^d \to [a,b]$, to within relative error $\epsilon > 0$. My intuition is that this is an extremely hard problem for $d \gg 1$. The following argument suggests otherwise. What am I missing?

Let $x_1,\ldots, x_n \sim U([0,1]^d)$ be i.i.d. samples from the uniform distribution over $[0,1]^d$. Then note that $\mathbb{E}f(x_1) = \int_{[0,1]^d} {\rm d}^dx f(x)$ and by Hoeffding,

$$
\mathbb{P}\left(\left| \frac{\frac{1}{n}\sum_{i=1}^n f(x_i) – \mathbb{E}f(x_1)}{\mathbb{E}f(x_1)} \right| \geq \epsilon \right) \leq 2 \exp\left(\frac{-2 n \epsilon^2 (\mathbb{E}f(x_1))^2}{(b-a)^2}\right)
$$

If, for simplicity I assume $a > 0$, then $f(x) \geq a \implies f(x)^2 \geq a^2 \implies (\mathbb{E}f(x_1))^2 \geq a^2 $. Then

$$
\mathbb{P}\left(\left| \frac{\frac{1}{n}\sum_{i=1}^n f(x_i) – \mathbb{E}f(x_1)}{\mathbb{E}f(x_1)} \right| \geq \epsilon \right) \leq 2 \exp\left(\frac{-2 n \epsilon^2 a^2}{(b-a)^2}\right)
$$

Hence, the number of samples required to achieve relative error $\epsilon$ with probability at least $1-\delta$ is
$$
n \geq \frac{1}{2\epsilon^2} \frac{(b-a)^2}{a^2} \log\frac{2}{\delta}
$$

So I can apparently efficiently estimate any integral in high dimensions, provided that the integrand is bounded away from zero? Seems too good to be true. Can we obtain similar bounds when $f(x)$ has indefinite sign?

Best Answer

No, integration in higher dimensions has not become exponentially easy. The procedure presented above is of the plain Monte-Carlo type and thus should be subject to the usual $\sim n^{-1/2}$ error convergence law. How can this be connected to the above statement?

Notice that the inequality as presented above is probabilistic, so the correct interpretation of it should be that the probability of the integral being off by $\epsilon \%$ from the true value decreases exponentially in the number of samples taken, but it's "lifetime"(number of evaluations needed to achieve $\frac{1}{e}$ decrease to the probability estimate) is proportional to ${1}/{\epsilon^2}$. That means for the integral to converge sufficiently within a given error margin $\epsilon$ you need at least $1/\epsilon^2$ sampling points to achieve that sense of adequate convergence.

In light of this,let's do a calculation to make this more precise. If we insist that your integral has converged significantly enough with $100x\%, x\leq 1$ confidence interval for a given error margin $\delta$ then we want

$$\mathbb{P}\left(\left| {\frac{1}{n}\sum_{i=1}^n f(x_i) - \mathbb{E}f(x_1)}\right| \geq \delta \right) \leq 2 \exp\left(\frac{-2 n \delta^2 }{(b-a)^2}\right)\leq 1- x$$

which implies that we need at least

$$n\geq\frac{(b-a)^2\ln(\frac{2}{1-x})}{2}\frac{1}{\delta^2}\equiv \frac{C^2}{\delta^2}$$

sample points to achieve convergence. This is also equivalent to saying that with $n$ sampling points available that we have converged in probability with good confidence interval for errors of the order $\delta\geq \frac{C}{\sqrt{n}}$, which indicates that this error estimate is really the best we can do with the available sampling points.

For a different treatment of plain Monte Carlo convergence see here.

EDIT: In the new version of the question it is asked if the relative error can be bounded in a similar manner. The answer is that, strictly, no, you cannot. And the reason for it is that you have to be able to guarantee that $\mathbb{E}f(x_1)\neq 0$, otherwise the random variable on the right hand side is divided by zero. Even if you can guarantee that, however, if you let the function be negative there is no easily accessible bound to be obtained on its expectation value (without more information provided on the nature of $f$ ) and knowing the range of $f$ is of no help either (except in the case where $f$ is strictly positive/negative). However, it is clear that the absolute error is still bounded by Hoeffding's inequality for any value of the expectation. This makes the nature of the question clear: If you have no handle on the expectation value of the function (some kind of bound, based on general properties of it or otherwise), then you have no handle on the rate of convergence of the relative error, since it explicitly depends on the unknown quantity. The absolute error estimate however, does not and it's rate of convergence can be reliably inferred.

Best Answer

Related Solutions

[Math] Why does Monte-Carlo integration work better than naive numerical integration in high dimensions

Double integration with Indicator function

Related Question