Probability Theory – How to Choose Specific Item in Weighted Sampling Without Replacement

probabilityprobability theorysampling

Given $n$ items with weight $w_n$ each — what is the probability that item $i$ is chosen in a $k$-out-of-$n$ "weighted random sampling without replacement" experiment? Can a closed-form solution that depends only on $w_i / w_\cdot$ be derived ($w_\cdot = \sum_j w_j$.)?

EDIT: A solution that depends only on $w_i / w_\cdot$ is impossible. Assume $n=3$, $i=1$, $w_1 = 1$, and two cases: (1) $w_2 = 1, w_3 = 1 + \varepsilon$, (2) $w_2 = 2, w_3 = \varepsilon > 0$. In case (1) the probability is almost $2/3$, in case (2) it is almost $1$, but in both cases $w_1 = 1$ and $w_\cdot = 3 + \varepsilon$.

What I have tried so far to solve the problem:

Let $P^n_k(w, i)$ be the probability that item $i$ is chosen in an $k$-out-of-$n$ experiment with weight vector $w$.
In the first draw, the item is chosen with probability $w_i / w_\cdot$. Otherwise, we are looking for the probability to choose this item in a $k-1$-out-of-$n-1$ experiment, with the same weight vector except for the item that has been selected. Hence:

$$
P^n_k(w, i) = w_i / w_\cdot + \sum_{j \neq i} w_j / w_\cdot \cdot P^{n-1}_{k-1}(w – j, i)
$$
$$
= w_\cdot^{-1} \left( w_i + \sum_{j \neq i} w_j \cdot P^{n-1}_{k-1}(w – j, i) \right)
$$

with $w-j$ being "the vector $w$ without the $j$th element".

How to solve this recurrence relation? (If it is correct at all…)

Best Answer

I'm pulling this from Pavlos S. Efraimidis, Paul G. Spirakis, Weighted random sampling with a reservoir, Information Processing Letters, Volume 97, Issue 5, 16 March 2006, Pages 181-185, ISSN 0020-0190, 10.1016/j.ipl.2005.11.003.

There, the authors begin by describing a basic weighted random sampling algorithm with the following definition:

Input: A population $V$ of $n$ weighted items

Output: A set $S$ with a WRS of size m

  1. Repeat Steps 2 and 3 for $k=1,2,\ldots,m$
  2. The probability of $v_i$ to be selected is:$$p_i(k) = \frac{w_i} {\sum_{S_j \in V - S} {w_j}}$$
  3. Randomly select an item $v_k \in V - S$ and insert it into $S$

The authors go on to explain how they arrive at the probability, but I'll summarize. Starting with the first item, the probability that $w_n$ is selected is its own weight divided by the sum of all weights. $$\frac{w_n}{w_1 + w_2+w_3+\ldots+w_n}$$ Easy enough. Now, the probability that each subsequent item will be chosen is its own weight divided by the sum of the remaining weights. If we do this calculation for each weight $w_i$ in order with $i=[1,n]$, we arrive at the authors' final summary equation for any permutation $\Pi$:

$$ P(\Pi) = \prod_{i=1}^{n} {\frac{w_i} {w_1 + w_2 +\ldots+w_i}} $$

Which is to say, the probability that an item is chosen can be defined as indexes of an array $w$ that contains all weights, like so: $$\frac{w(i)}{w([1,i])}$$

This isn't particularly difficult work, but I didn't want to create the proof myself, hence the quoting of Efraimidis & Spirakis.

Related Question