Solved – Distribution of sampling without replacement

exponential-familyrankingsampling

Consider $N$ items with associated weights $w_i$. Each time, we sample one item from the remainder without replacement and the sampling probability is proportional to the weights. Continue sampling until all items are selected and we acquire a sequence. What's the distribution of this sequence? Does it belong to the exponential family?

Best Answer

Assume the item weights $W = \{w(1), \dots, w(N)\}$ are nonnegative and sum to one. Let $X=\{x_1, \dots, x_N\}$ denote a particular sequence, where $x_i$ is an integer from 1 to $N$ representing the index of the $i$th item drawn. Let $X_{i:j} = \{x_i, \dots, x_j\}$ denote a particular subsequnce. The probability of a sequence can be written as:

$$p(X \mid W) = p(x_1 \mid W) \prod_{i=2}^N p(x_i \mid X_{1:i-1}, W)$$

$p(x_1 \mid W)$ is the probability for the first item drawn. This is simply the item's weight:

$$p(x_1 \mid W) = w(x_1)$$

For subsequent draws ($i \ge 2$), $p(x_i \mid X_{1:i-1}, W)$ is the conditional probability of the $i$th item drawn, given all previously drawn items. It can be expressed as follows. The probability of drawing an item is zero if it has already been drawn. Otherwise, it's equal to the item's weight, divided by the sum of weights for all remaining items. This sum is equal to one minus the sum of weights for previously drawn items. Therefore:

$$p(x_i \mid X_{1:i-1}, W) = \frac{w(x_i)}{1 - \sum_{j=1}^{i-1} w(x_j)} \prod_{j=1}^{i-1} \mathcal{I}(x_i \ne x_j)$$

where $\mathcal{I}(\cdot)$ is the indicator function, which returns 1 if its argument is true, otherwise 0. Note that the product of indicator functions evaluates to one if and only if $x_i$ isn't a duplicate of previous items.

Final answer

Combine everything above to obtain the following. Sequences containing duplicate items have probability 0. Otherwise, if there are no duplicates:

$$p(X \mid W) = w(x_1) \prod_{i=2}^N \frac{w(x_i)}{1 - \sum_{j=1}^{i-1} w(x_j)}$$

Related Question