[Math] Coupon collector problem with $k$ distinct coupon sets to complete

coupon-collectorinclusion-exclusionprobability

In the standard coupon collector problem we have an urn with $n$ different coupons, from which coupons are being collected, equally likely, with replacement. Simple analysis shows that the expected number of draws needed to collect all coupons is asymptotically $\Theta(n\log n)$.

Consider the variant of the problem in which there are $k$ sets of $n$ distinct coupons each. What is the expected number of draws needed to complete at least one series of $n$ coupons? More precisely, let $\mu$ be the uniform distribution over the set $\{x_{i,j}\}_{i \in [k], j\in[n]}$, where $[n] = \{1, \ldots, n\}$. What is the expected (asymptotic) number of draws from $\mu$ needed to obtain all members of the set $\{x_{i,j}\}_{j\in[n]}$ for at least one $i \in [k]$?

Best Answer

This can be solved using inclusion-exclusion. There are $\binom kj$ ways to choose $j$ particular sets to finish, and the probability to have completed all $j$ of them is the probability to have completed a standard coupon collection with $jn$ coupons while drawing from $kn$ coupon types. Since the expected number of draws is the sum of the non-completion probabilities over all times, it satisfies the same inclusion-exclusion relation as the probabilities. Drawing from $kn$ coupon types while collecting $jn$ increases the expected number of draws by a factor $\frac kj$. Thus the desired expectation is

\begin{align} \sum_{j=1}^k(-1)^{j-1}\binom kj\frac kjjnH_{jn} &=kn\sum_{j=1}^k(-1)^{j-1}\binom kjH_{jn} \\ &=kn\sum_{j=1}^k(-1)^{j-1}\binom kj\left(\log j+\log n+\gamma+\frac1{2jn}\right)+O\left(\frac kn\right)\\ &=kn\left(\log n+\gamma\right)+\frac12kH_k+kn\sum_{j=1}^k(-1)^{j-1}\binom kj\log j+O\left(\frac kn\right)\\ &=knH_n+\frac12kH_k-k+kn\sum_{j=1}^k(-1)^{j-1}\binom kj\log j+O\left(\frac kn\right)\;. \end{align}

For the example $n=10$, $k=2$ used in Tad's answer, this yields the approximation

$$ 20\left(H_{10}-\log2\right)+H_2-2\approx44.2164\;, $$

close to Tad's approximation.

The remaining sum is treated in Proof $\sum\limits_{k=1}^n \binom{n}{k}(-1)^k \log k = \log \log n + \gamma +\frac{\gamma}{\log n} +O\left(\frac1{\log^2 n}\right)$; substituting that expansion leads to

$$ kn\left(H_n-\log\log k-\gamma-\frac\gamma{\log k}+\frac{\pi^2+6\gamma^2}{12\log^2k}\right)+\frac12kH_k-k+O\left(\frac{kn}{\log^3k}\right)\;. $$