Problem regarding coloring balls drawn from a bin

balls-in-binscoupon-collectorexpected valueprobabilityrecreational-mathematics

I'm not sure if this has been asked before, but here I have the following problem:

There are $n$ indistinguishable red balls in a bin. Each "round," $k$ balls are randomly chosen from the bin with equal probability. All $k$ balls are colored blue and put back into the bin (this means that balls that are already blue are not changed). What is the expected number of rounds needed to color all $n$ balls blue?

Is there an explicit formula for this problem? I've thought about this problem for a bit and quickly got stumped, but I realized that unless there was some insight for the problem, a recursive formula of some sort is probably needed as each round relies on the results of the previous rounds. If the formula turns out to be recursive, is there some relatively "simple" heuristic that can estimate the expected number of rounds?

There's also the tricky case of infinitely drawing blue balls after initially drawing some red balls, which would result in an infinite number of rounds. But my gut feeling says that the probability of this happening are small enough that the expected number of rounds should be finite.

After searching a bit online, this seems to be (correct me if I'm wrong) the Coupon collector's problem but multiple coupons are drawn at once instead of just one at a time, and that each of the coupons drawn are all distinct. If that's the case, can any of the insights from that problem be used in the problem I have currently?

Best Answer

You already have an excellent solution, but here is a solution by a different method that renders the result in a different form.

Define $T$ to be the number of the first draw on which all red balls have been seen; we want to find $E(T)$. We will use the theorem that $$E(T) = \sum_{m=0}^{\infty} P(T > m)$$ Now $T>m$ if and only if at least one ball has never been seen in draws one through $m$; by inclusion/exclusion, this probability is $$P(T > m) = \sum_{j=1}^n (-1)^{j+1} \binom{n}{j} \frac{\binom{n-j}{k}^m}{\binom{n}{k}^m}$$ So $$E(T) = \sum_{m=0}^{\infty} P(T > m) = \sum_{m=0}^{\infty}\sum_{j=1}^n (-1)^{j+1} \binom{n}{j} \frac{\binom{n-j}{k}^m}{\binom{n}{k}^m}$$ Switching the order of summation, $$\begin{align} E(T) &= \sum_{j=1}^n \sum_{m=0}^{\infty} (-1)^{j+1} \binom{n}{j} \frac{\binom{n-j}{k}^m}{\binom{n}{k}^m} \\ &= \sum_{j=1}^n (-1)^{j+1} \binom{n}{j} \sum_{m=0}^{\infty} \frac{\binom{n-j}{k}^m}{\binom{n}{k}^m} \\ &= \sum_{j=1}^n (-1)^{j+1} \binom{n}{j} \frac{1}{1-\binom{n-j}{k}/\binom{n}{k}} \end{align}$$ In the last step we summed a geometric series.

Related Solutions

[Math] Expiring coupon collector’s problem

This answer isn't rigorous in justifying approximations, but the result is confirmed numerically.

I'll call the $N$ different coupons colours to distinguish them more clearly from the coupons drawn.

Let $M=\alpha N\log N$, and consider the limit $N\to\infty$ for fixed $\alpha$. First, let's calculate the variance of the number of coupons drawn in the unmodified coupon collector's problem. As the expectation is obtained as the sum of the expectations of $N$ independent values, the variance is the sum of the variances of these values. The number of draws to get a new colour when $k$ colours are still missing is geometrically distributed with $p=k/N$ and thus expectation $1/p=N/k$ and variance $(1-p)/p^2=(N^2-kN)/k^2$. The sum of the expectations is the well-known result

$$ \sum_{k=1}^N\frac Nk=NH_N\sim N\log N\;, $$

where $H_N$ is the $N$-th harmonic number. The sum of the variances is

$$ \sum_{k=1}^N\frac{N^2-kN}{k^2}\sim\frac{\pi^2}6N^2-N\log N\sim\frac{\pi^2}6N^2\;. $$

Thus the standard deviation is asymptotically a fixed fraction $\pi/\sqrt6$ of $N$, and by Chebyshev's inequality for fixed $\alpha\gt1$ the process asymptotically almost surely ends before expiration sets in, so the expected number of coupons in this case is just the unmodified expected number $NH_N$.

On the other hand, for the same reason, for fixed $\alpha\lt1$ the process asymptotically almost surely doesn't end before expiration sets in, so the expected number of coupons in this case is $M$ plus the expected number of coupons drawn after the onset of expiration.

To estimate the latter, let's first estimate the probability that all $N$ colours are represented in $M$ uniformly independently drawn coupons. According to Byron's answer to this question, this is

$$ \sum_{k=0}^N (-1)^k {N\choose k}\left(1-{k\over N}\right)^M=\sum_{k=0}^N (-1)^k {N\choose k}\left(1-{k\over N}\right)^{\alpha N\log N}\;. $$

We can approximate this by

$$ \sum_{k=0}^N (-1)^k {N\choose k}\mathrm e^{-k\alpha\log N}=\sum_{k=0}^N (-1)^k {N\choose k}\left(N^{-\alpha}\right)^k=\left(1-N^{-\alpha}\right)^N\sim\exp\left(-N^{1-\alpha}\right) $$

for $N\to\infty$ if the terms of the series become negligible before the approximation breaks down. To check this, consider the logarithm of the absolute value of the (approximated) terms,

$$ \log\left(\binom Nk\mathrm e^{-k\alpha\log N}\right)\approx N\log N-k\log k-(N-k)\log(N-k)-k\alpha\log N\;, $$

and set the derivative with respect to $k$ to zero:

$$ -\log k+\log(N-k)-\alpha\log N=0 $$

to find the maximum at $k=N/(1+N^\alpha)$. Thus for $N\to\infty$ the maximum shifts towards vanishing fractions of $N$, and the approximation is asymptotically valid.

Now a first estimate of the expected number of coupons drawn after the onset of expiration would be $\exp\left(N^{1-\alpha}\right)$, the result if at every draw the $M$ unexpired coupons were independent of the ones at previous draws. This already exhibits the desired feature of interpolating between exponential behaviour for $\alpha\to0$ and $N\log N$ behaviour for $\alpha\to1$. (Remember that $M=\alpha N\log N$ gets added to this to obtain the total expected number of coupons.)

To improve the estimate, we need to condition on the previous batches not containing all colours. Since asymptotically a batch almost surely doesn't contain all colours, the denominator in the definition of the conditional probability tends to $1$, and the probability for the current batch to contain all colours conditioned on the previous batches not containing all colours is asymptotically equal to the probability that the current batch contains all colours and the previous batches didn't.

The most important part of the condition, which is independent of the colours of recently expired coupons, is simply that the $M-1$ unexpired coupons we already had last time don't contain all $N$ colours. The probability that $M$ coupons contain all $N$ colours but the first $M-1$ of them don't is

$$ \begin{align} &\sum_{k=0}^N (-1)^k {N\choose k}\left(1-{k\over N}\right)^M-\sum_{k=0}^N (-1)^k {N\choose k}\left(1-{k\over N}\right)^{M-1} \\ \sim&\sum_{k=0}^N (-1)^k {N\choose k}\left(1-{k\over N}\right)^M-\sum_{k=0}^N (-1)^k {N\choose k}\left(1-{k\over N}\right)^M\left(1+{k\over N}\right) \\ =&\sum_{k=0}^N (-1)^k {N\choose k}\left(1-{k\over N}\right)^M\left(-\frac kN\right) \\ \sim&\sum_{k=0}^N (-1)^k {N\choose k}\left(N^{-\alpha}\right)^k\left(-\frac kN\right) \\ =&N^{-\alpha}\left(1-N^{-\alpha}\right)^{N-1} \\ \sim&N^{-\alpha}\exp\left(-N^{1-\alpha}\right)\;. \end{align} $$

Thus we obtain an improved estimate for the expected number of draws after the onset of expiration, $N^{\alpha}\exp\left(N^{1-\alpha}\right)$. In fact this will turn out to be asymptotically correct, but we need to check the effect of the conditions implied by the colours of the recently expired coupons.

To do so, imagine the drawing process reversed in time, with recently drawn coupons being removed and recently expired coupons being added. We can interpret the above calculation to show that, conditional on all $M$ coupons containing all $N$ colours, removing one coupon has a probability of $1-N^{-\alpha}$ of removing a unique colour, whereas with probability $N^{-\alpha}$ all colours remain represented. This result remains valid if we remove further coupons; the changes in $M$ and $N$ by $O(1)$ only change the result by a factor $1+O(N^{-1})$. Thus, asymptotically, conditional on all $M$ coupons containing all $N$ colours, each removed recently drawn coupon independently has a probability of $1-N^{-\alpha}$ of reducing the number of colours represented by one.

On the other hand, the recently expired coupons are not affected by the condition that our current set of coupons contains all colours, so the probability of regaining a particular missing colour by adding a recently expired coupon back in is just $1-N^{-1}$.

With this model, we can obtain a systematic expansion of the steady-state probability of completing the colours on a given draw, by considering increasing numbers of missing colours. I'll show the calculation for one additional missing colour, which is straightforward and demonstrates that the corrections don't affect the asymptotic behaviour.

We know that one colour immediately goes missing when we remove the coupon just drawn. Let $j+1$ be the number of recently drawn coupons we need to remove beyond that to lose another colour, and let $l+1$ be the number of expired coupons we have to recoup to replace the colour of the coupon just drawn. Then this history is excluded if $l\le j$, since in that case the colour just drawn gets replaced before another one goes missing, implying a full set of $N$ colours in the past. Thus we want the fraction of histories for which $l\gt j$. This is

$$ \begin{align} &\sum_{j=0}^\infty N^{-\alpha}\left(1-N^{-\alpha}\right)^j\sum_{l=j+1}^\infty N^{-1}\left(1-N^{-1}\right)^l \\ =&\sum_{j=0}^\infty N^{-\alpha}\left(1-N^{-\alpha}\right)^j\left(1-N^{-1}\right)^{j+1} \\ \sim&\frac{N^{-\alpha}}{N^{-\alpha}+N^{-1}} \\ =& \frac1{1+N^{\alpha-1}}\;. \end{align} $$

Multiplying this by the probability $N^{-\alpha}\exp\left(-N^{1-\alpha}\right)$ and taking the reciprocal yields an improved estimate for the expected number of coupons drawn after the onset of expiration, $N^\alpha\exp\left(N^{1-\alpha}\right)\left(1+N^{\alpha-1}\right)$. Note that the correction doesn't affect the asymptotic behaviour, since $1+N^{\alpha-1}\sim1$.

I also carried out the calculations for two and three additional missing colours, which are a bit more involved. I won't bore you with the details; the result is that the expected number of coupons is multiplied by rational functions of $N^{\alpha-1}$ that go to $1$ for $N^{\alpha-1}\to0$. The expansion only seems to converge for rather small values of $N^{\alpha-1}$, but that doesn't matter asymptotically.

Thus, the analysis suggests that the expected number of coupons drawn after the onset of expiration is asymptotic to $N^{\alpha}\exp\left(N^{1-\alpha}\right)$. This is difficult to test numerically for most $\alpha$, since for $\alpha$ close to $1$ the expansion in $N^{1-\alpha}$ converges very slowly and for $\alpha$ close to $0$ the expected number of draws is prohibitively large. A reasonable compromise is $\alpha=0.8$, for which the following table shows the average number of coupons drawn after the onset of expiration in $5000$ runs for $N=10\cdot2^n$ with $n=0,\dotsc,12$ and $M$ the closest integer to $0.8N\log N$. Also shown is the ratio to the asymptotic result $N^{\alpha}\exp\left(N^{1-\alpha}\right)$ and to the result of the first-order correction, $N^{\alpha}\exp\left(N^{1-\alpha}\right)\left(1+N^{\alpha-1}\right)$. Both ratios appear to be approaching $1$, the corrected one more quickly.

$$ \begin{array}{r|r|r|r|r|r|r} N&M&\langle\text{#draws}\rangle&N^{0.8}\exp(N^{0.2})&\cdot\,(1+N^{-0.2})&\text{ratio}&\text{corrected}\\\hline 10 & 18 & 28 & 31 & 50 & 0.9115 & 0.5589\\ 20 & 48 & 62 & 68 & 105 & 0.9196 & 0.5936\\ 40 & 118 & 158 & 155 & 229 & 1.0226 & 0.6918\\ 80 & 280 & 428 & 368 & 521 & 1.1638 & 0.8217\\ 160 & 650 & 1097 & 916 & 1247 & 1.1976 & 0.8790\\ 320 & 1477 & 3019 & 2403 & 3161 & 1.2563 & 0.9550\\ 640 & 3308 & 8994 & 6703 & 8544 & 1.3418 & 1.0527\\ 1280 & 7326 & 25913 & 20055 & 24850 & 1.2921 & 1.0428\\ 2560 & 16072 & 85089 & 65037 & 78573 & 1.3083 & 1.0829\\ 5120 & 34984 & 294659 & 231341 & 273258 & 1.2737 & 1.0783\\ 10240 & 75645 & 1122292 & 915127 & 1059479 & 1.2264 & 1.0593\\ 20480 & 162647 & 4998493 & 4089855 & 4651474 & 1.2222 & 1.0746\\ 40960 & 348008 & 24025351 & 21028673 & 23542526 & 1.1425 & 1.0205\\ \end{array} $$

Here's the code I used to produce the table.

I’ve clicked XKCD’s “random” button k times and I’ve already seen all of them. What’s the expected number of XKCD’s I’ve seen

If you've seen $n$ comics out of $N$, the probability that $k$ consecutive comics are all ones you've seen is $(\frac nN)^k$, which is proportional to $n^k$ (the $N^k$ denominator is constant). So if the prior odds are uniform, the posterior odds that you've seen $1, 2, \dots, N$ comics are $$ 1^k : 2^k : 3^k : \cdots : N^k. $$ So the conditional probability that you've seen $n$ comics out of $N$ is $\frac{n^k}{1^k + 2^k + \cdots + N^k}$, and we get an expected value of $$ \frac{1 \cdot 1^k + 2 \cdot 2^k + \dots + N \cdot N^k}{1^k + 2^k + \cdots + N^k}. $$ This doesn't simplify particularly well, but when $N$ is large compared to $k$, we may approximate the sum by an integral; the numerator is approximately $ \int_0^N x^{k+1}\,dx = \frac{N^{k+2}}{k+2}$ and the denominator is approximately $\int_0^N x^k \,dx = \frac{N^{k+1}}{k+1}$.

So the expected total number of comics you've read, given that you've sampled $k$ comics and you've read all of them, is approximately $\frac{k+1}{k+2} N$.

Best Answer

Related Solutions

[Math] Expiring coupon collector’s problem

I’ve clicked XKCD’s “random” button k times and I’ve already seen all of them. What’s the expected number of XKCD’s I’ve seen

Related Question