The probability of picking a full set from multiset after $m$ draws

binomial-coefficientscombinatoricsprobabilityprobability theoryrecreational-mathematics

Suppose a bag contains $n$ balls labeled from $1$ to $n$, and suppose I have $k$ of these bags. If I open all of these $k$ bags into an urn, then the urn is effectively a multiset with $kn$ elements: $k$ balls labeled $1$, $k$ balls labeled $2$, and so on up to $n$. My question is

If I were to randomly pick out balls from the urn one by one without replacement, what's the probability of having picked out a complete set of balls labeled $1$ to $n$ after $m$ balls have been pulled out of the urn?


I'm not very familiar with probability, so I'm not sure what's the correct setup for the problem. I suspect the answer has to do with the binomial coefficients, as we're choosing $m$ elements from a set with $kn$ things, but I don't know how to account for the repetition of elements.

Any and all help is greatly appreciated. Thank you!

Best Answer

Answer: The probability you are done after $m$ draws is $$ \sum_{i=0}^n (-1)^i \binom{n}i\binom{k(n-i)}{m}\Big/\binom{kn}{m} $$ Proof: This is a routine application of the principle of inclusion-exclusion. For each $i\in \{1,\dots,n\}$, let $E_i$ be the event that none of the $m$ balls you drew are numbered $i$. We want to find $$P(E_1^c\cap \dots \cap E_n^c)=1-P(E_1\cup \dots \cup E_n).$$ By PIE, this is equal to $$ \sum_{i=0}^n(-1)^{i} \sum_{1\le j(1)<j(2)<\dots <j(i)\le n} P\big(E_{j(1)}\cap \dots \cap E_{j(i)}\big) $$ By the symmetry of the problem, for all choices of $j(1),j(2),\dots,j(i)$, we have $P\big(E_{j(1)}\cap \dots \cap E_{j(i)}\big)=P(E_1\cap \dots \cap E_i) $. Therefore, the above simplified to $$ \sum_{i=0}^n (-1)^i \binom{n}i P(E_1\cap \dots \cap E_i). $$ Finally, $E_1\cap \dots \cap E_i$ is the event that no balls numbered $1$ to $i$ are drawn. The probability of this occurring is $\binom{k(n-i)}{m}\Big/ \binom{kn}m$, since the favorable outcomes are determined by choosing $m$ balls from the $k(n-i)$ which are not labeled with a number between $1$ and $i$.