Solved – Optimal proposal for self-normalized importance sampling

importance-sampling

Consider a function $f: \mathcal X \to \mathbb R$ and a probability distribution $p$ with the support on $\mathcal X$ which we can evaluate up to a normalizing constant, i.e. we can only evaluate $\tilde p(x) = Zp(x)$ where $Z = \int_{\mathcal X} \tilde p(x) \,\mathrm dx$.
Given an integral
$$I = \int_{\mathcal X} p(x)f(x) \,\mathrm dx$$
that we want to estimate, a proposal distribution $q(x)$ which we can sample from and whose density we can evaluate, the self-normalized importance sampling estimator of $I$ is
$$\hat I = \frac{\sum_{n = 1}^N W_n f(X_n)}{\sum_{n = 1}^N W_n},$$
where $X_n \sim q$ and $W_n = \tilde p(X_n) / q(X_n)$.

The optimal proposal (that minimizes the variance of $\hat I$) has the form $q_{\text{opt}}(x) \propto |f(x) – I| p(x)$ according to page 9 of Chapter 9 of Art Owen's Monte Carlo book, which in turn refers to Chapter 2 of Tim Hesterberg's thesis.
However, I haven't been able to track down any proof of this in the thesis.

How would one go about showing this?

Best Answer

It should be noted that $q_{opt}$ actually minimizes the approximate variance given by the Delta method. You can get this by solving

$$ q_{opt} = \arg\min_q\mathbb{E}_q[w^2(X)(f(X)-I)^2], \; \text{ s.t.} \int q(x)dx=1 $$

Now, since: $$ \mathbb{E}_q[w^2(X)(f(X)-I)^2] = \int\frac{p^2(x)}{q(x)}(f(x)-I)^2dx = \int L(x,q(x))dx $$

for $L(x,q(x)) = \frac{p^2(x)}{q(x)}(f(x)-I)^2$, using Lagrange multipliers for calculus of variations yields $$ \begin{aligned} 0&=\frac{\partial L}{\partial q} +\lambda \\ &= -\frac{p^2(x)}{q^2(x)}(f(x)-I)^2 + \lambda \end{aligned} $$

Thus $$ q^2(x) = \frac{p^2(x)}{\lambda}(f(x)-I)^2 \implies q_{opt}(x) \propto p(x)|f(x)-I| $$

Related Question