Drawing colored balls from an urn until all colors have been picked

balls-in-binscombinatoricsprobability

Problem statement

Assume we have an urn with $n$ balls in $n$ colors. (for $n=3$ you would have
3 blue, 3 red and 3 green balls in the urn). We pull a ball, check the color, and place it back into the urn. We keep drawing balls until we have picked a ball in each color.

On average (expectancy value) how many times do we draw a ball in a color we have already picked?

Attempt

Unsure where to start we wrote a Python script attempting to solve this problem

import numpy as np
import numpy.random

for n_items in range(3, 7):
    tracker = 0

    runnum = 10 ** 5
    for i in range(runnum):
        Sum = 0
        checkvec = np.zeros(n_items)
        tracker -= n_items
        while Sum < n_items:
            item = np.random.randint(0, n_items)
            checkvec[item] = 1
            Sum = sum(checkvec)
            tracker += 1

    print(n_items, round(100 * (tracker / runnum)) / 100)

Giving the following table

n Extra draws required
1 0
2 1
3 2.5
4 4+1/3

I was unable to find a closed form for this sequence, but some regression analysis
led me to the following approximation

$$
B(n) = 0.1571 n^2 + 0.662 n – 0.872
$$

Does there exists a closed form for this problem?

Best Answer

This is the standard version of the Coupon Collector's Problem, where each colour has an equal probability of getting picked, and you want to know the expected number of draws until you collect every colour.

If the probability of getting a new colour in one draw is $p$, then the expected number of draws until you get that new colour is $1/p$. The probability of getting a new colour starts at $\frac{n}{n}$, and each time the numerator decreases as you collect another colour, so it then goes $\frac{n-1}{n}$, $\frac{n-2}{n}$, ... $\frac{1}{n}$.

The total expected number of draws is therefore $$\frac{n}{n} + \frac{n}{n-1} + ... + \frac{n}{1}$$

which can be rearranged as

$$n \left(\frac{1}{1} + \frac{1}{2} + ... + \frac{1}{n}\right) = nH_n$$

where $H_n$ is the $n$th harmonic number.

If you exclude the $n$ draws in which you get a new colour you get the number of extra draws, which is $nH_n - n = n(H_n-1)$.

Related Question