Drawing marbles until you run out of one color

probabilityprobability distributions

You have a bag with marbles and draw without replacement. The marbles have $k$ distinct colors and there are exactly $n$ marbles of each color. All marbles are equally likely to be drawn. You continue drawing until the bag runs out of a color so you have drawn all $n$ marbles of that color. What is the expected number of marbles you need to draw and what is the probability distribution to model this process?

You clearly have do draw at least $n$ marbles and at most $k(n-1) +1$ by the pigeon hole principle. This looks similar to the negative hypergeometric distribution but it doesn't quite fit because I want to stop drawing the moment I run out of any color not just one particular color.

Simplest example: There are $2$ red and $2$ green marbles in the bag. Without loss of generality let the first marble drawn be red. There is now a $1/3$ probability that the second marble is also red and I stop. If the second marble is green, the third marble will be either red or green and the bag will be out of that color so I stop no matter what. So with probability $2/3$ I will draw $3$ marbles. Hence the expected number of marbles drawn is $2\cdot (1/3) + 3\cdot (2/3)=2.67$.

Best Answer

Let $X$ be the random variable denoting the total number of drawn marbles. We are interested in the expected number of drawn marbles, i.e. $\mathbb{E}[X]$. If we label our colours with the numbers $1,..,k$ then $X=\sum_{j=1}^kX_j$ for $X_j$ being the number of drawn colour-$j$ marbles. Note that $X_j$ is distributed according to a multivariate hypergeometric random variable, where the total number of draws is also a random variable $X$ and to be determined. Hence

\begin{eqnarray} \mathbb{E}[X] &=& \mathbb{E}[\sum_{m=n}^{k(n-1)+1}m\chi_{X=m}] \nonumber \\ &=& \sum_{m=n}^{k(n-1)+1}m\mathbb{P}[X=m] \nonumber \\ &=&\sum_{m=n}^{k(n-1)+1}m\mathbb{P}[\exists!X_j=n\text{ and last drawn marble is of colour }j]. \nonumber \end{eqnarray}

Now all left to do is to figure out the number $A_m=\mathbb{P}[\exists!X_j=n\text{ and last drawn marble is of colour }j]$. We have that $X_j$ is a multivariate hypergeometric random variable with $m$ draws, $k$ colours all of which we have $n$ marbles:

This means that the probability of drawing $(a_j)_{j=1,..,k}$ marbles of each colour $j$ with a total of $m$-marbles is precisely $$ \mathbb{P}_{\text{Hypergeometric}_{m,n,k}}[\text{Draw } (a_j)_{j=1,..,k}]=\frac{\binom{n}{a_1} ... \binom{n}{a_k}}{\binom{nk}{m}} $$

A first guess would be

\begin{eqnarray} A_m &"="& k \sum_{\substack{a_1,..,a_{k-1}=0 \\ a_1+...a_{k-1}+n=m}}^{n-1}\frac{\binom{n}{a_1}...\binom{n}{a_{k-1}}}{\binom{nk}{m}} \nonumber. \\ \end{eqnarray}

However this does not rule out the cases, where we have not drawn colour $j$ lastly. To account for this we simply draw one marble less of colour $j$ and sample the remaining.

\begin{eqnarray} A_m &=& k \frac{1}{nk-m+1} \sum_{\substack{a_1,..,a_{k-1}=0 \\ a_1+...a_{k-1}+n=m}}^{n-1}\frac{\binom{n}{n-1}\binom{n}{a_1}...\binom{n}{a_{k-1}}}{\binom{nk}{m-1}} \nonumber\\ &=& \sum_{\substack{a_1,..,a_{k-1}=0 \\ a_1+...a_{k-1}+n=m}}^{n-1}\frac{\binom{n}{a_1}...\binom{n}{a_{k-1}}}{\binom{nk-1}{m-1}}.\nonumber\\ \end{eqnarray}

Thus we have the explicit formula $$ \mathbb{E}[X]=\sum_{m=n}^{k(n-1)+1}mA_m. $$

Edit 1: Using Vandermonde's identity (for binomial coefficients) we (almost) have that \begin{eqnarray} A_m &"="& \frac{\binom{n(k-1)}{m-n}}{\binom{nk-1}{m-1}} .\nonumber\\ \end{eqnarray} However note that the indices $a_j$ cannot attain the value $n$.

Related Question