[Math] Generalization on Coupon Collector’s Problem

co.combinatoricspr.probability

Player extracts card from the deck (which has infinite number of size) to obtain one of $k$ colors of cards. The possibility that the player pick a card with $i$th color is given by $p_i>0$. Of course, $\sum_{i=1}^{k}p_{i} = 1$. If you collect at least one card from all colors, then we call you completed one set of cards. After certain number of trials (as the player want to end), dealer gives player $m$ dollars, where $m$ is the number of set of cards the player have completed. My object is to calculate the expectation value of number of trials to get $m$ dollar, where $m$ is given.

To rewrite it more mathematically:

Let $X_1,X_2,\cdots,X_k$ be multinomial distribution with probability $p_i>0$ respectively. Let $g(k,m)$ be the least number of trials which making $\min(X_i)=m$. $g(k,m)$ forms a distribution of the number of trials, namely $n$. What is the explicit formula of $\mathbb{E}(g(k,m))$?

We know that if $m=1$, it becomes the Coupon Collector's problem. The exact formula is given by

$$\mathbb{E}(g(k,1)) = \int_{0}^{\infty}(1-\prod_{i=1}^{k}(1-\exp(-p_{i}t)))dt$$

At first, I tried the simplest case: when $k=2$, the probability that after exactly $n$ trials we get $m$ dollars is $$\mathbf{P}(X_{1}=m-1,X_{2}=n-m)\times p + \mathbf{P}(X_{1}=n-m,X_{2}=m-1)\times (1-p)$$
$$={n\choose m-1} p^{m}(1-p)^{m}(p^{n-2m}+(1-p)^{n-2m})$$
Therefore:
$$\mathbb{E}(g(2,m))=\sum_{n=2m}^{\infty}{n\choose m-1} np^{m}(1-p)^{m}(p^{n-2m}+(1-p)^{n-2m})$$
Especially, $\mathbb{E}(g(2,1))=\frac{p^{2}-p+1}{p(1-p)}$, which attains minimum at $p=\frac{1}{2}$, and this result fits well, since the difference between probabilities from extracting cards of different colors increase, making one set with given number of trials will be significantly harder. I have two conjectures:

$\mathbb{E}(g(k,m))$ attains minimum value when $p_{1}=p_{2}=\cdots=p_{k}=\frac{1}{k}$.

$\mathbb{E}(g(k,m))$ will behave like $\frac{m}{\min(p_i)}$ asymptotically as $m \to \infty$.

But I failed to figure out for the general case, since it requires too many number of calculations of infinite sums over binomial coefficients.

Can I get your help? Thanks.

Best Answer

For uniform probabilities, this problem is called "double Dixie cup problem". Then $$\mathbb{E}(g(k,m)) = k \ln k + (m-1)k \ln \ln k+kC_m + o(K)$$ as $k \to \infty$, where $C_m=\gamma -\ln (m-1)!$ and $\gamma$ is Euler's constant. This result is due to Newman and Shepp (1960), and Erdős and Rényi (1961).

For the non-uniform probabilites, the problem has been studied by Doumas and Papanicolaou (The coupon collector’s problem revisited: generalizing the double Dixie cup problem of Newman and Shepp, https://arxiv.org/abs/1412.3626)

Related Question