The meaning and intuition behind Laplace’s method for the maximum of a function

laplace-method

I am reading the book Bandit Algorithms. On page 257, it talks about how to approximate the maximum of a function with Laplace's method. I cannot understand how this method can be used to find the maximum of a function. The book presents Laplace's method as follows:

Assume that $f:[a,b]\rightarrow\mathbb{R}$ is twice differentiable and has a unique maximum at $x_0\in (a,b)$ with $-q=f^{''}(x_0)<0$. Laplace's method for approximating $f(x_0)$ is to compute the integral
$$
I_s=\int_{a}^{b}\exp\left( sf(x) \right)\: dx
$$

I am wondering how we can determine $f(x_0)$ from $I_s$. I would be thankful if someone can give me an example and intuition behind this method. I read Wikipedia page but I don't understand how it can be used to find the maximum of a function.

on Page 258, it presents "Method of Mixtures" as follows:

Lapalce's approximation suggests that $\max_{x} \: M_t(x)\approx \int_{\mathbb{R}^d} M_t(x)\: dh(x)$, where $h$ is some measure on $\mathbb{R}^d$ chosen so that the integral can be calculated in closed form.

I am wondering if $h(x)$ should be always a probability distribution and if $M_t(x)$ can be any arbitrary function.

Best Answer

The intuition behind Laplace's method is that because the exponential function exaggerates large values, as $s\rightarrow \infty$, the function $x\mapsto e^{sf(x)}$ will eventually be totally dominated by its maximum value, which is $e^{s f(x_0)}$, where $x_0$ maximizes $f$.

It's easier to see this in the discrete case first, so let $f \colon \{1, \dotsc, n\} \rightarrow \mathbf{R}$ and let $x_0 = \mathop{\mathrm{argmax}}_{x} f(x)$ be a unique maximizer. Now, we have that \begin{align*} \sum_{x=1}^n e^{sf(x)} &= e^{s f(x_0)}\sum_{x=1}^n e^{s(f(x) - f(x_0))} \\ &= e^{s f(x_0)}\Bigl(1 + \sum_{x\neq x_0} e^{s(f(x) - f(x_0))}\Bigr) \\ &= e^{s f(x_0)}(1 + o(1)), \end{align*} where we see that the latter sum is $o(1)$ from the fact that $f(x) < f(x_0)$ for each $x \neq x_0$.

Therefore, in this case, we can determine $f(x_0)$ from the resulting limit $$ \frac{1}{s}\log\Bigl(\sum_{x=1}^n e^{sf(x)}\Bigr) \rightarrow f(x_0). $$

The continuous case is more complicated since we have to take into account the curvature of $f$ at $x_0$, but the idea is the same. I'll omit the technical details, which are presented in the textbook that you linked, but the upshot is that the analogous result is that \begin{align*} \int_a^b e^{s f(x)} dx &= e^{s f(x_0)} \sqrt{\frac{2\pi}{s |f''(x_0)|}}(1 + o(1)), \end{align*} from which we similarly get that $$ \frac{1}{s}\log\Bigl(\int_a^b e^{s f(x)} dx\Bigr) \rightarrow f(x_0). $$


As for your second question, the $h$ need not be a probability measure, and that this doesn't matter much due to the very liberal meaning of $\approx$.

In particular, since we're taking $s^{-1}\log I_s$, the $\approx$ can hide even any polynomial factor without affecting the result. So unless $h$ is really bad (like $dh(x) = e^{|x|} dx$ or something), there's no problem.

We can again see what's going on schematically in the discrete case, so let $h$ be any measure on $\{1, \dotsc, n\}$. We then have that \begin{align*} \int e^{s f(x)} dh(x) = \sum_{x=1}^n e^{s f(x)} h(x) = e^{s f(x_0)}(h(x_0) + o(1)), \end{align*} from which we have \begin{align*} \frac{1}{s}\log \Bigl(\int e^{s f(x)} dh(x)\Bigr) = f(x_0) + \frac{1}{s}\log(h(x_0) + o(1)) \rightarrow f(x_0). \end{align*}

Basically, $e^{s f(x)}$ only cares about what's going on in the immediate vicinity of $x_0$, where the value of the measure is approximately constant. But then any constant contribution gets killed by the $s^{-1} \log I_s$ transformation.

Related Question