Solved – Results on Monte Carlo estimates produced by importance sampling

importance-samplinginformation theorymonte carlo

I have been working on importance sampling fairly closely for the past year and have a few open-ended questions that I was hoping to get some help with.

My practical experience with importance sampling schemes has been that they can occasionally produce fantastic low-variance and low-bias estimates. More frequently, however, they tend to produce high-error estimates that have low sample variance but very high bias.

I am wondering whether anyone can explain exactly what kinds of factors affect the validity of importance sampling estimates? In particular, I am wondering:

1) Are importance sampling estimates guaranteed to converge to the correct result when the biasing distribution has the same support as the original distribution? If so, why does this appear to take so long in practice?

2) Is there a quantifiable relationship between the the error in an estimate produced through importance sampling and the "quality" of biasing distribution (i.e. how much it matches the zero-variance distribution)

3) Partially based on 1) and 2) – is there a way to quantify 'how much' you have to know about a distribution before you were better off using an importance sampling design than a simple Monte Carlo method.

Best Answer

Importance sampling has exactly the same validation as the basic Monte Carlo approach. At its core, it is basic Monte Carlo. Indeed, it is simply a change of reference measure, going from $$ \int h(x) f(x) \text{d}x $$ to $$ \int h(x) \dfrac{f(x)}{g(x)} g(x) \text{d}x $$ Thus the convergence is garanteed by the law of large numbers in both cases, i.e whether you simulate from $f$ or from $g$. In addition, if the term $$ \int h^2(x) \dfrac{f^2(x)}{g(x)} \text{d}x $$ is finite, the central limit theorem also applies and the speed of convergence is $\text{O}(1/\sqrt{n})$. If it "takes so long in practice", it is because the above variance factor in the CLT can be quite large. But, and I insist, the speed is the same as with regular Monte Carlo, $\text{O}(1/\sqrt{n})$.

The quality of an importance sampling distribution is thus directly related with the above variance factor, which goes to zero for the "zero-variance distribution" proportional to $|h(x)|f(x)$.

Related Question