Consider $N$ items with associated weights $w_i$. Each time, we sample one item from the remainder without replacement and the sampling probability is proportional to the weights. Continue sampling until all items are selected and we acquire a sequence. What's the distribution of this sequence? Does it belong to the exponential family?
Solved – Distribution of sampling without replacement
exponential-familyrankingsampling
Related Solutions
You describe a sequence of rejection samples.
By definition, after selecting $k$ objects in the process of obtaining a weighted sample of $m$ objects without replacement from a list of $n$ objects, you draw one of the remaining $n-k$ objects according to their relative weights. In your description of the alternative sampling scheme, at the same stage you will be drawing from all $n$ objects according to their relative weights but you will throw back any object equal to one of the first $k$ and try again: that's the rejection. So all you have to convince yourself of is that the chance of drawing one of the remaining $n-k$ objects is the same in either case.
If this equivalence isn't perfectly clear, there's a straightforward mathematical demonstration. Let the weights of the first $k$ objects be $w_1, \ldots, w_k$ and let the weights of the remaining $n-k$ objects be $w_{k+1}, \ldots, w_n$. Write $w = w_1 + w_2 + \cdots + w_n$ and let $w_{(k)} = w_1 + w_2 + \cdots + w_k$. The chance of drawing object $i$ ($k \lt i \le n$) without replacement from the $n-k$ remaining objects is of course
$$\frac{w_i}{w_{k+1} + w_{k+2} + \cdots + w_n} =\frac{w_i}{w-w_{(k)}}.$$
In the alternative (rejection) scheme, the chance of drawing object $i$ equals
$$\sum_{a=0}^\infty \left[\left(\frac{w_{(k)}}{w}\right)^a \frac{w_i}{w}\right] =\frac{1}{1 - w_{(k)}/w}\frac{w_i}{w} = \frac{w_i}{w-w_{(k)}}$$
exactly as before. The sum arises by partitioning the event by the number of rejections ($a$) that precede drawing element $i$; its terms give the chance of a sequence of $a$ draws from the first $k$ elements followed by drawing the $i^\text{th}$ element: because these draws are independent, their chances multiply. It forms a geometric series which is elementary to put into closed form (the first equality). The second equality is a trivial algebraic reduction.
I am going to try to expand my previous comment and correct some notation (let me know if I am mistaken). (Thanks for the suggestion @Macro)
1. In this case you need to know explicitely what $p(x_j\vert x_{j-1})$ means (Note that here $x_{j-1}$ is a known value and this distribution depends on this value somehow). An example of this is
$$x_j\sim \mbox{Normal}(x_{j−1},1).$$
In this case, in order to sample from $x_j$, you just need to sample from the a normal distribution with mean $x_{j−1}$ and variance $1$. For other examples you can check the wikipedia page for Markov chains or check other transition kernels.
2. A pseudocode for obtaining a sample of size $M$ with replacement is as follows
Starting with a known sample $\{x_i,i=1,...,N\}$, repeat $M$ times
(i). Consider partitioning the interval $(0,1)$ into $N$ subintervals $I_1=(0,w_1)$, $I_2=(w_1,w_1+w_2)$,..., $I_N = (\sum_{j=1}^{N-1}w_j,1)$.
(ii). Simulate $u\sim U(0,1)$.
(iii). Identify the interval $I_j$, $j\in\{1,...,N\}$, such that $u\in I_j$.
(iv). Take the sample $x_j$.
As you can see, there is a difference between these two methods.
In the first one you have a model, $p(x_{j}|x_{j−1})$ , and you simulate from it. Starting from, say $x_0$, the next value $x_1$ is sampled from $p(x_1|x_0)$; then the second value $x_2$ is sampled from $p(x_2|x_1)$ and so forth. For this purpose you need to be able to simulate from the conditional distributions $p(\cdot\vert\cdot)$. This is, you obtain a sample $\{x_i,i=1,...,N\}$
In the second sampling method (re-sampling) you already have a sample $\{x_i,i=1,...,N\}$ and you obtain a new sample by picking elements from the original one. This is similar to the 'urn game'. Imagine you have an urn with $N$ elements of different sizes (illustrating the different weights), you pick one of them, record the result and put it back in the urn. This experiment is repeated $M$ times.
I hope this helps.
Best Answer
Assume the item weights $W = \{w(1), \dots, w(N)\}$ are nonnegative and sum to one. Let $X=\{x_1, \dots, x_N\}$ denote a particular sequence, where $x_i$ is an integer from 1 to $N$ representing the index of the $i$th item drawn. Let $X_{i:j} = \{x_i, \dots, x_j\}$ denote a particular subsequnce. The probability of a sequence can be written as:
$$p(X \mid W) = p(x_1 \mid W) \prod_{i=2}^N p(x_i \mid X_{1:i-1}, W)$$
$p(x_1 \mid W)$ is the probability for the first item drawn. This is simply the item's weight:
$$p(x_1 \mid W) = w(x_1)$$
For subsequent draws ($i \ge 2$), $p(x_i \mid X_{1:i-1}, W)$ is the conditional probability of the $i$th item drawn, given all previously drawn items. It can be expressed as follows. The probability of drawing an item is zero if it has already been drawn. Otherwise, it's equal to the item's weight, divided by the sum of weights for all remaining items. This sum is equal to one minus the sum of weights for previously drawn items. Therefore:
$$p(x_i \mid X_{1:i-1}, W) = \frac{w(x_i)}{1 - \sum_{j=1}^{i-1} w(x_j)} \prod_{j=1}^{i-1} \mathcal{I}(x_i \ne x_j)$$
where $\mathcal{I}(\cdot)$ is the indicator function, which returns 1 if its argument is true, otherwise 0. Note that the product of indicator functions evaluates to one if and only if $x_i$ isn't a duplicate of previous items.
Final answer
Combine everything above to obtain the following. Sequences containing duplicate items have probability 0. Otherwise, if there are no duplicates:
$$p(X \mid W) = w(x_1) \prod_{i=2}^N \frac{w(x_i)}{1 - \sum_{j=1}^{i-1} w(x_j)}$$