Distribution for first time to have drawn all colors

coupon-collectorprobability

Consider an urn with $n=\sum_{i=1}^rn_i$ balls, where $n_i$ represents the number of balls of color $i$ (out of $r$ possible colors).

What’s the distribution describing the first time it takes to have drawn all colors, with replacement? For example, with red, blue, and green balls and a draw sequence of: blue, blue, blue, red, red, blue, green, the time is seven. Is this a known (parametric) distribution?

Best Answer

Your problem is an instance of what is known as the coupon collector's problem where you have $\ r\ $ "coupons" with probabilites $\ \frac{n_1}{n},\frac{n_2}{n},\dots,\frac{n_r}{n} \ $ of being received on each draw. The "classical" version of the problem, on which there's an enormous literature, is to calculate the expected number of draws needed to obtain a collection containing a given number of distinct coupons when the probabilities of each coupon being received on each draw are all equal. You'll find a solution to this problem on pp.210-211 of volume $1$, $2$nd edition, of Feller's An Introduction to Probability Theory and Its Applications. The solution of the same problem for the case when the probabilities are not equal can be found in this paper.

There appears to have been much less written about the distribution of the waiting time to obtain a complete collection, and on trawling briefly through pages returned by a google search, I didn't come across any where this was derived. I give a formula for it below.

Let $\ B_{it}\ $ be the number of balls of type $\ i\ $ in the collection after $\ t\ $ draws. Then $\ \big(B_{1t},B_{2t},\dots,B_{rt}\big)\ $ is multinomially distributed: $$ P\big(B_{it}=b_i\ \text{for}\ 1\le i\le r\big)=\frac{t!}{b_1!\,b_2!\ \dots\,b_r!}\prod_{i=1}^r\Big(\frac{n_i}{n}\Big)^{b_i} $$ if $\ \sum_\limits{i=1}^rb_i=t\ $.

Let $\ T\ $ be be the draw on which the collection is completed. Then $\ T=t\ $ if and only if $\ B_{i\,t-1}\ge1\ $ for all $\ i\ $ except exactly one, say $\ i=j\ $, $\ B_{j\,t-1}=0\ $, and a ball of type j is received on draw $\ t\ $. Thus, if $\ t\ge r\ $, then \begin{align} P(T=t)&=\sum_{j=1}^rP\big(B_{j\,t-1}=0, B_{i\,t-1}\ge1\ \text{for}\ i\ne j, 1\le i\le r\big)\Big(\frac{n_j}{n}\Big)\\ &=\sum_{j=1}^r\frac{n_j}{n}\sum_{\hspace{0.7em}b_j=0\\b_i\ge1,i\ne j\\ \sum_ib_i=t-1}\frac{(t-1)!}{b_1!\,b_2!\ \dots\,b_r!}\prod_{i=1}^r\Big(\frac{n_i}{n}\Big)^{b_i}\\ &=\sum_{j=1}^r\sum_{\hspace{0.7em}b_j=1\\ b_i\ge1,i\ne j\\ \sum_ib_i=t}\frac{(t-1)!}{b_1!\,b_2!\ \dots\,b_r!}\prod_{i=1}^r\Big(\frac{n_i}{n}\Big)^{b_i}\\ &=\frac{1}{t}\sum_{j=1}^r\sum_{\hspace{0.7em}b_j=1\\ b_i\ge1,i\ne j\\ \sum_ib_i=t}\frac{t!}{b_1!\,b_2!\ \dots\,b_r!}\prod_{i=1}^r\Big(\frac{n_i}{n}\Big)^{b_i}\\ &=\frac{1}{t}\sum_{j=1}^rP\big(B_{jt}=1,B_{it}\ge1\ \text{for}\ i\ne j, 1\le i\le r\big)\ . \end{align} The $\ j^{\,\text{th}}\ $ summand in the sum on the right side of the final identity is the probability that exactly one of the first $\ t\ $ balls drawn is of the $\ j^{\,\text{th}}\ $ type, and at least one ball of every other type appears in those first $\ t\ $. The last identity thus confirms the plausible hypothesis that this probability should be $\ t\ $ times the probability that a single occurrence of a ball of the $\ j^{\,\text{th}}\ $ type during the first $\ t\ $ draws occurs on the $\ t^{\,\text{th}}\ $ draw, and at least one ball of every other type appears during those draws.

Related Question