Solved – Success of Bernoulli trials with different probabilities and without replacement

bernoulli-processbinomial distributionhypergeometric-distributionpoisson-binomial-distributionprobability

Assuming $n$ independent Bernoulli trials with different probabilities, the Poisson binomial distribution is the discrete probability distribution that describes the number of $X$ successes.

A Hypergeometric distribution is the discrete probability distribution that describes the probability of $k$ successes in $n$ draws, in trials without replacement.

What is the distribution that describes the number of $X$ successes of a Bernoulli trial without replacement and with different probabilities?

Edit:
More specifically, I am trying to reason directly from probabilities.

Hypergeometric distribution and the Multivariate hypergeometric distribution does not explicitly deals with probabilities (i.e deals with number of success in a number of draws). In the urn with balls of different colors example, is there a way for me to access the number of success knowing the probability of drawing each color, but in a scenario where I do not know how many balls of each color the urn contains?

In other words: assuming an urn containing balls with n colors; and assuming I do not know the number of balls with each color, but I know that the probabilities of sampling each color is p=(p1,p2,…pn). I want to know, for example, the probability of sampling, without replacement, one X1, one X2 and one X5.

Thanks

Best Answer

I would say that the closest to your description is the multivariate hypergeometric distribution

$$ f(k_1,\dots,k_c) = \frac{\prod_{i=1}^c {K_i \choose k_i}}{ N\choose n} $$

where you sample $n$ marbles appearing in $c$ colors, where you have $K_i$ marbles of each color, so you can sample no more then $N = \sum_i K_i$ marbles in total.

Since you seem to be asking about the total number of successes (as in Poisson-binomial distribution in your example), then I understand that you count $c-1$ colors as "successes" and the $c$-th color as failure. To obtain distribution of total number of "successes" in such case you can notice that marginally multivariate hypergeometric distribution follows a hypergeometric distribution, so $k_\text{tot} = \sum_{i=1}^{c-1} k_i$ follows a hypergeometric distribution where you sample $n$ marbles from the urn containing $K_\text{tot} = \sum_{i=1}^{c-1} K_i $, since it is the same as subtracting $k_c$ from $n$, where $k_c$ itself is drawn from univariate hypergeometric distribution.

Saying this in plain English: simply imagine that you painted all the $1,\dots,c-1$ colorful marbles in black leaving the $c$-th marble untouched, so now you are drawing from the urn containing only two types of marbles (black vs $c$-th) and you do not need to care about all the kinds of marbles you started with.

So the case with finite number of marbles and drawing without replacement is simple. On another hand, if you are dealing with infinite numbers of marbles, then sampling with replacement does not differ from sampling without replacement, so you are dealing with Poisson-binomial distribution.

Notice that initial probabilities of drawing the balls are $p_i = K_i/N$, so if you are dealing with finite number of balls, then knowing the initial probabilities is the same as knowing the counts.