[Math] Coupon Collector’s Problem with Partial Collections and Coupon Packages

combinatoricscoupon-collectorstatistics

I am looking for the solution to the Coupon Collector's Problem with subsets and packages – how many draws are necessary?

Preliminaries

To introduce my question I will give some preliminaries first.

Coupon Collector's Problem No. 1: Classical Variant

The classic Coupon Collector's Problem (CCP) can be described as follows: given an urn containing $n$ different balls, how often do we have to draw a ball with replacement in order to see each ball at least once?
Each ball is equally likely.
Let $X_i$ be the additional number of draws we have to make in order to get from $i-1$ different balls drawn to $i$ different balls drawn. Obviously, $X_0 = 0$. $X_1 = \frac{n}{n} = 1$ since we need to draw one ball to get a new one. $X_2 = \frac{n}{n-1}$, because the probability is $\frac{n-1}{n}$ to draw a new ball and hence geometric distributed. Therefore the expected value is $\frac{n}{n-1}$. Finally, to get the last ball, we need to draw $X_n = \frac{n}{1}$ balls in average. Let $X(n)$ be the number of draws we have to make to get each of the $n$ balls at least once. Then $X(n) := \sum_{i=1}^n X_i$ and
\begin{align}
E[X(n)] = n\left(\frac{1}{n}+\frac{1}{n-1}+\ldots+\frac{1}{2}+\frac{1}{1}\right) = nH_n = n \sum_{i=0}^{n-1} \frac{1}{n-i}
\end{align}
where $H_n$ is the $n$-th harmonic series and $E$ is the expected value.
There are plenty of references for this result, for example this book on page 303.

Coupon Collector's Problem No. 2: Partial Collection

The CCP2 can be formulated as follows:
given an urn containing $n$ different balls, how often do we have to draw a ball with replacement in order to see $1\leq k\leq n$ distinct balls at least once?
Each ball is equally likely.
We call this event $X_k(n)$.
We can simply cut the summation of CCP1:
\begin{align}
E[X_k(n)] = n\left(\frac{1}{n}+\frac{1}{n-1}+\ldots+\frac{1}{n-(k-1)}\right) = n(H_n – H_{n-k}) = n \sum_{i=0}^{k-1} \frac{1}{n-i}.
\end{align}
See for example equation two in this paper for reference.

Coupon Collector's Problem No. 3: Coupon Packets

The CCP3 can be formulated as follows:
given an urn containing $n$ different balls, how often do we have to draw subsets of $1 \leq s\leq n$ distinct balls in order to see each ball at least once? Each ball and each subset is equally likely.
We denote this event by $X^s(n)$, and the answer is given on page eight of this paper as follows:
\begin{align}
E[X^s(n)] &= \sum_{i=1}^{n-s} (-1)^{i+1} \binom{n}{i} \frac{1}{1-\frac{\binom{n-i}{s}}{\binom{n}{s}}}
+ \sum_{i=1}^s (-1)^{n-s+1+i} \binom{n}{n-s+i}\\
&= \binom{n}{s} \sum_{i=1}^{n-s} (-1)^{i+1} \frac{\binom{n}{i}}{\binom{n}{s}-\binom{n-i}{s}}
+ \sum_{i=1}^s (-1)^{n-s+1+i} \binom{n}{n-s+i}
\end{align}
or as equation 7 in this paper:
\begin{align}
E[X^s(n)] = \binom{n}{s} \sum_{i=1}^{n} (-1)^{i+1} \frac{\binom{n}{i}}{\binom{n}{s}-{\binom{n-i}{s}}}
.
\end{align}

Question

The next problem instance describes my problem and the corresponding question.

Coupon Collector's Problem No. 4: Coupon Packets and Partial Collection

The CCP4 can be formulated as follows:
given an urn containing $n$ different balls, how often do we have to draw subsets of $1 \leq s\leq n$ distinct balls in order to see $1\leq k\leq n$ distinct balls at least once? Each ball and each subset is equally likely.
We denote this event by $X_k^s(n)$.
\begin{align}
E[X_k^s(n)] &= \mathrm{?}\\
&\approx \frac{1}{s} \sum_{i=0}^{k-1} \frac{n-(i\bmod s)}{n-i}
\end{align}
I tried various changes and theories to come up with a solution for the correct value of $E[X_k^s(n)]$, but it seems either the solution needs some tricks or I am just not able to see it right now. Therefore I wanted to know if somebody knows the solution for my problem. Since it seems to be a combination of CCP2 and CCP3, I guess there exists a quite elegant way.
I only have an incorrect approximation right now but want to have the correct value.

I am thankful for any hints to references or solutions. Also, if there is no solution known yet, I would also appreciate this information.

Best Answer

You may refer to Theorem 1 of Kokza2007 "A Survey of the Coupon Collector’s Problem with Random Sample Sizes" for the solution of CCP4.

Fixed $s$ value is just a special case of the theorem, and the transition probability between states $i$ and $j$, $p_{ij}$ follows the hypergeometric distribution, where the states $i$, $j$ denote the numbers of distinct coupons you observe before and after a draw, respectively. Given $p_{ij}$, you may then construct a transition matrix of a Markov chain, which is absorbing at state $i=n$. The solution to your CCP3 is the expected number of steps that the Markov chain is absorbed, and $E[X_k^s(n)]$ in CCP4 is the expected number of steps to reach state $k$ from state $0$, which, just like $E[X_n^s(n)]$, can be calculated making use of the fundamental matrix of the absorbing Markov chain.

By the way, there is a Matlab function hygepdf() to obtain $p_{ij}$.

Related Question