[Math] Multivariate Hypergeometric Distribution Questions

I am working through E.T. Jaynes Probability Theory book and am having difficulty with exercise 3.2: Suppose and urn contains $N = \sum N_i$ balls, $N_1$ of color 1, $N_2$ of color 2, … $N_k$ of color $k$. We draw $m$ balls w/o replacement; what is the probability that we have at least one of each color?

Supposing $k = 5$ and $N_i = 10$, I reasoned the solution is $ {10 \choose 1}^5 / {50 \choose 5} $. How would I generalize this equation for $ m $ when $N_i$ and $k$ are not specified?

Thank you very much!

update: here is my solution to the generalized problem so far:
total number of possible draws: ${ N \choose m}$
combinations of $k$ colors: $\prod { N_i \choose 1} $
combinations of remaining draws: ${N – k \choose m -k } $
if the probability of m draws with at least k colors is:( (combinations satisfying at least one of k colors) $\times$ (combinations of rest of draws)) $\div$ (total number of draws) we get:
$ \frac{ \prod { N_i \choose 1}{N – k \choose m -k } }{{N \choose m}} = $ probability of drawing m balls with at least of each color k.

Best Answer

Let $(X_1, X_2, \ldots, X_k)$ be a random vector denoting number of balls of each color in the sample of size $m = X_1 + X_2 + \cdots + X_k$. The event "we have at least one of each color" translates into each $X_i$ being positive. The probability we seek is thus $$ p = \mathbb{P}\left( X_1 >0 \land X_2>0 \land \ldots \land X_k>0\right) = \sum_{x_1 =1}^{m-1} \sum_{x_2 =1}^{m-1} \cdots \sum_{x_k=1}^{m-1} \frac{\binom{N_1}{x_1} \binom{N_2}{x_2} \cdots \binom{N_k}{x_k}}{\binom{N}{m}} \delta_{m,x_1+x_2+\cdots+x_k} $$ For example, for $k=5$, and $N_i=10$ and $m=10$ the probability proves equal $$ \frac{30890625}{50108674} \approx 0.6165 $$

In the case where there is equal number of balls of different colors in the urn, i.e. $N_i = c$ for $ 1 \leqslant i \leqslant k$, we can use inclusion-exclusion principle to find the answer. The complementary probability $$\begin{eqnarray} 1 -p &=& \mathbb{P}\left(X_1 = 0 \lor X_2 = 0 \lor \ldots \lor X_k=0 \right) \\ &=& \sum_{1 \leqslant {i_1} \leqslant k} \mathbb{P}\left(X_{i_1}=0\right) - \sum_{1 \leqslant i_1 < i_2 \leqslant k} \mathbb{P}\left(X_{i_1}=0 \land X_{i_2}=0\right) \\ &\phantom{=}& + \sum_{1 \leqslant i_1 < i_2 < i_3 \leqslant k} \mathbb{P}\left((X_{i_1}=0 \land X_{i_2}=0 \land X_{i_3}=0\right) - \cdots \\ &\phantom{=}& - (-1)^k \mathbb{P}\left(X_1=0 \land X_2 = 0 \land \cdots \land X_k = 0\right) \end{eqnarray} $$ Due to exchangeability: $$ \mathbb{P}\left(X_{i_1}=0\right) = \mathbb{P}\left(X_1=0\right) = \frac{\binom{c}{0} \binom{(k-1)c}{m}}{\binom{k c}{m}} = \frac{\binom{(k-1)c}{m}}{\binom{k c}{m}} $$ $$ \mathbb{P}\left(X_1 = 0 \land X_2 = 0\right) = \frac{\binom{c}{0} \binom{c}{0} \binom{(k-2)c}{m}}{\binom{k c}{m}} = \frac{ \binom{(k-2)c}{m}}{\binom{k c}{m}} $$ and so on, with $\mathbb{P}\left(X_1 = X_2 = \ldots = X_s = 0\right) =\frac{ \binom{(k-s)c}{m}}{\binom{k c}{m}}$. Hence, given that $\sum_{1 \leqslant i_1 < i_2 < \ldots < i_s \leqslant k} 1 = \binom{k}{s}$ we arrive at the result: $$ 1- p = \sum_{s=1}^k (-1)^{s-1} \binom{k}{s} \frac{\binom{(k-s)c}{m}}{\binom{k c}{m}} $$ that is $$ p = \sum_{s=0}^k (-1)^s \binom{k}{s} \frac{\binom{(k-s) c}{m}}{\binom{k c}{m}} \stackrel{k \to s-k}{=} \sum_{s=0}^k (-1)^{k-s} \binom{k}{s} \frac{\binom{s c}{m}}{\binom{k c}{m}} $$

Best Answer

Related Solutions

[Math] Multivariate Hypergeometric Distribution/Urn Problem

Probability of choosing different colors when there are more colors than balls per color

Related Question