Importance Sampling – How to Achieve Optimal Importance Sampling with Ratio Estimator

importanceminimum-variancesamplingunbiased-estimator

We want to approximate the following expectation:
$$\mathbb{E}[h(x)] = \int h(x)\pi(x) dx$$
Where $h(x)$ is an arbitrary function and $\pi(x)$ is a distribution, also for simplicity, let's assume that we actually know the normalizing constant for $\pi(x)$. Of course we would like to sample from the optimal proposal distribution: $$g(x) = \frac{|h(x)|\pi(x)}{Z}$$ but of course this is not going to be a form we can sample from and we could not even compute the importance weights since we would need to know $Z$: $$ w(x) = \frac{Z \ \pi(x)}{|h(x)|\pi(x)}$$
But, if we assume $g(x)$ can be sampled from, can we use the ratio importance sampling estimator?: $$ \frac{\int w(x)h(x)g(x) dx}{\int w(x)g(x)dx}$$ To be clear, the estimator can also be written by letting $\{x^{(i)}\}_{i=1}^N$ be a set of samples from distribution with density $g(x)$ (magically). We let $$w(x^{(i)}) = \frac{1}{|h(x^{(i)})|}$$
making the final estimator: $$ \mathbb{E}[h(x)] \approx \frac{\sum_{i=1}^N w(x^{(i)})h(x^{(i)})}{\sum_{i=1}^N w(x^{(i)})} $$

So, is it true that the above estimator is asymptotically unbiased (consistent)? Or have I missed something? If it is indeed unbiased, could this then be used in conjunction with Monte Carlo approaches to sample from $g(x)$, since they can be used (in theory) to sample from any distribution known up to a normalizing constant.

Edit: fixed a typo, also, I was able to prove this is consistant, so my new question is: is this a good idea? Are there any paper analyzing this? does it have a standard name?

Best Answer

This is an interesting [and very far from "stupid"] question that actually bothered me for a while! We cover it in Monte Carlo Statistical Methods (Section 3.3.2, pages 95-96). The crux of it is that, by dividing by the sum of the weights the optimality vanishes. It is actually easy to see when $h$ is a positive function. In this case, $$ w(x) h(x) = 1 $$ and $$ w(x) = \frac{1}{h(x)} $$ so $$ \widehat{\mathbb{E}[h(X)]} = \dfrac{1}{\frac{1}{n}\sum_{i=1}^n \frac{1}{h(x_i)}} $$ which is the dreaded harmonic mean estimator (see also this great and definitive post by Radford Neal). The estimator is consistent (in the sense of the Law of Large Numbers) but it is likely to have an infinite variance (which takes us very far from the minimum variance optimality of the original estimator!).

The fundamental reason why optimality does not transfer is that the variance of the ratio is quite different from the variance of the original importance sampling estimate and thus is not optimised for the same importance function. Sadly, since there is no closed form expression for the variance of the ratio (only delta methods approximations are available), there is no definite result on the optimal solution $g$. Of course, one could use different optimal importance functions for top and bottom, but this does not lead anywhere in practice!

Related Question