Solved – Number of independent samples for weighted samples

independenceweighted-sampling

I am taking random samples from one distribution, $f(x)$, but trying to get information about another distribution, $g(x)$. I have a weighting function, $w(x)=Cg(x)/f(x)$, to correct for this. The result is that I have $N$ independent samples, with different weights, $w_i$, attached to them.

The question is then, what is a good estimate for the number of independent samples I really have.

For example, if my weights are {0.49, 0.48, 0.01, 0.01, 0.01} then I have pretty close to 2 independent samples. If they are {0.3, 0.3, 0.4} then I have about 3. Presumably there is a quantitative way to do this.

Also, how could I determine, given $f(x)$ and $w(x)$, what the efficiency of sampling is (i.e. How many independent samples of $g(x)$ do I, on average, get for $N$ samples of $f(x)$)?

Best Answer

This "number of independent samples I really have" is called the effective sample size in simulation books, $N_\text{ess}$. Given a sample $$ x_1,\ldots,x_N \sim g(x) $$ leading to weights $w_i$ $(1\le i\le N)$, and their normalised version $$ \bar w_i = w_i / \sum_{j=1}^N w_j\,, $$ the estimate for $N_\text{ess}$ is given by $$ \hat N_\text{ess} = 1 \big/ \sum_{j=1}^N \bar w_j^2\,. $$ You can prove that $1\le \hat N_\text{ess}\le N$. In your example, the effective sample size is estimated by

$>$ we=c(0.49, 0.48, 0.01, 0.01, 0.01)

$>$ 1/sum((we/sum(we))^2)

[1] 2.124044

a wee more than 2.

I am not sure I understand the last part of the question.