Solved – Bootstrapping importance sampling estimates

bootstrapimportance-sampling

I've got two algorithms, $A$ and $B$, I want to evaluate. Both algorithms differ by the distribution of points they produce. Let us suppose that $A$ generates points from $P_S$ distribution while B generates points according to $P_T$ distribution.

Further, I've got a dataset of 21,000 points generated by algorithm $A$, i.e. from $P_S$ distribution. Each point has it's value $v_i$ in {-1, 0, +1}: $v_i \in \{-1, 0, 1\}$.

Next, I am generally interested in studying two quantities: (1) mean value over points the algorithms produce, (2) once deployed, what is probability that algorithms $A$ and $B$ will result in positive mean after observing $k = \{100, 200, …, 20000\}$ points?

In order to answer (1) I run importance sampling. Since the points were generated by A, $mean_A$ is calculated straightforwardly
$$
mean_A = \frac{1}{N} \sum_i v_i \frac{P_S}{P_S}
$$
For $B$ it's a little bit more tricky:
$$
mean_B = \frac{1}{N} \sum_i v_i \frac{P_T}{P_S}
$$

It turns out $B$ outperforms $A$, i.e. it puts more weight on positive points.

The problem arises then I approach the question (2). Naive idea was to bootstrap $k$ points from the dataset uniformly ( = from $P_S$) and run the same importance sampling re-weighting as before. And the results are discouraging: for any $k$, no matter how many bootstraps I run, the importance sampling estimates for $B$ are positive less frequently, than for $A$, i.e. $A$ outperforms $B$, despite that $B$ puts more weight on positive points than $A$ (as proved by experiment (1)).

A possible problem is that for a considerable part of the points $P_T = 0$, i.e. they are never produced by $B$. In the literature I examined, it is claimed that this is not a problem for IS, but obviously this is a problem for naive bootstrapping I do: adding zero-weighted points will force $B$ to perform worse, since they are more and more likely to fill in the bootstrap sample. Moreover, putting infinite weight in a single positive point will again result into $B$ under-performing $A$.

The only solution I see so far is to filter out all the "impossible for $B$" points before running bootstrapping and calculate IS estimate with respect to source distribution "$P_S$ with the impossible points removed".

Is this the only possible solution? Is the problem considered somewhere in the literature? Is there anything I can reference to? Or, should I replace this uniform bootstrap with something else (non-uniform)?

Any suggestions are appreciated.

PS. It seems to me that I can replace the naive bootstrap I'm doing with the sampling/importance resampling algorithm (SIR): given $N$ points sampled from $A$ (the dataset) I can calculate importance weights as in IS and use them (after normalisation) as probabilities to sample again. The resulting sample is going to be iid sample from $B$ and thus can be (?) used as a bootstrap sample from $P_T$. Am I right?

Best Answer

Computing part (2) seems like straight forward probability question. No need to involve statistics. As far as I understand, your question is how to compute the probabilities: $$P(mean_A^N \geq 0) \text{ and } P(mean_B^N \geq 0) $$ Let's compute the first one first, since it's easier. Note the the distribution over all $N$ values of $v_i$ will be a multinomial distribution. For ease of notation, call $p_S^+ = P(v_i = 1)$, $p_S^0 = P(v_i = 0)$, and $p_S^- = P(v_i = -1)$. \begin{eqnarray} P(mean_A^N \geq 0) &=& P\left(\frac{1}{N}\sum^N_{i=1}v_i \geq 0\right) \nonumber \\ &=& P\left(\#(v_i = 1) \geq \#(v_i = -1)\right) \nonumber \\ &=& \sum_{x=0}^N \sum^{\min(x,N-x)}_{y=0}\frac{N!}{x!(N-x-y)!y!}(p_S^+)^x(p_S^0)^{N-x-y}(p_S^-)^y \end{eqnarray} and similarly, we compute: \begin{eqnarray} P(mean_B^N \geq 0) &=& P\left(\frac{1}{N}\sum^N_{i=1}v_i\frac{P_T(v_i)}{P_S(v_i)} \geq 0\right) \nonumber \\ &=& P\left(\frac{p_T^+}{p_S^+}\#(v_i = 1) \geq \frac{p_T^-}{p_S^-}\#(v_i = -1)\right) \nonumber \\ &=& P\left(\frac{p_S^-p_T^+}{p_T^-p_S^+}\#(v_i = 1) \geq \#(v_i = -1)\right) \nonumber \\ &=& \sum_{x=0}^N \sum^{\min\left(\left\lfloor\frac{p_S^-p_T^+}{p_T^-p_S^+}x\right\rfloor,N-x\right)}_{y=0}\frac{N!}{x!(N-x-y)!y!}(p_S^+)^x(p_S^0)^{N-x-y}(p_S^-)^y \end{eqnarray}