Multiple Sources Coupon Collector Problem Variant

coupon-collectorprobability

I am looking for help analyzing a variant of the Coupon Collector Problem where the coupons can come from more than 1 source. And the probability of each a given coupon that comes from multiple sources might differ between the sources.

Example:

Assume there are 2 Unfair Dice.

  • Die 1 has sides {A, B, C, D, E, O} with probabilities {1/10, 1/10, 1/5, 1/5, 1/20, 7/20}

  • Die 2 has sides {E, F, O} with probabilities {1/5, 1/4, 11/20}

The Collector whishes to obtain at least 1 of {A, B, C, D, E, F} The
Collector does not care if O is received, this is the probability that
no coupon is received on a given roll.

The Collector wants to know what the is the expected number of rolls $E[N]$
of each die in order to complete their collection?

From this paper The Unequal Probability variant of the Coupon Collector Problem can be solved for each die with the following:

Die 1:

$$\int_0^\infty1-(1-e^\frac{-t}{10})^2(1-e^\frac{-t}{5})^2(1-e^\frac{-t}{20})dt \approx 26.275$$

Die 2:

$$\int_0^\infty1-(1-e^\frac{-t}{5})(1-e^\frac{-t}{4})dt \approx 6.778$$

However doing this would effectively double count the E coupon in the total expectation of dice rolls.

Removing Coupon E from both formulas yields:

Die 1:

$$\int_0^\infty1-(1-e^\frac{-t}{10})^2(1-e^\frac{-t}{5})^2dt = 16.25$$

Die 2:

$$\int_0^\infty1-(1-e^\frac{-t}{4})dt = 4$$

The issue I am looking for help on is how to resolve this double counting. How can I account for the probability that the Collector might have received Coupon E while rolling Die 1 or Die 2? How would one account for the shared probability of Coupon E in the $E[N]$ of each die?

EDIT: I want to add the my central question could probably be reframed as "What is the expected number of rolls of Die 2 given that Die 1 was rolled enough to complete the Die 1 specific Coupons to obtain both Coupon E and Coupon F?"

In my example; given that I rolled Die 1 16.25 times how many times am I expected to have to roll Die 2 to complete the collection?

Best Answer

As noted in the comments and your edit, an optimal strategy is to roll die $1$ until you have all coupons specific to that die and then to roll die $2$ until you have all coupons. (The situation isn’t symmetric because die $2$ has the higher probability for the common coupon $E$, so doing it the other way around wouldn’t be optimal.)

To figure out how long this is expected to take, we need, in addition to the expected time required to collect the coupons specific to die $1$, the probability that when we have them we’re still missing coupon $E$. This is the probability that coupon $E$ is the last coupon collected in a complete collection from die $1$ (of course always ignoring coupon $O$). Due to the linearity of expectation, we don’t need to worry about the strong correlation between $E$ being the last coupon collected from die $1$ and the time required for the collection of the other coupons to finish. ($E$ is more likely to be collected last the earlier the other coupons are complete.)

The probability $p$ that $E$ is collected last from die $1$ is calculated in two different and quite instructive ways at Last coupon collected in the coupon collectors problem, once using inclusion–exclusion and once using Poissonization (the latter in more detail in this cstheory answer linked to there). The result is

$$ p=\sum_{S\subset R}(-1)^{|S|}\frac{p_E}{p_E+\sum_{m\in S}p_m}\;, $$

where $R=\{A,B,C,D\}$ is the set of the remaining coupons on die $1$. With your probabilities, this is

\begin{eqnarray} p &=& \frac11-\frac2{1+2}-\frac2{1+4}+\frac1{1+2+2}+\frac4{1+2+4}+\frac1{1+4+4} \\[5pt] &&-\frac2{1+2+2+4}-\frac2{1+2+4+4}+\frac1{1+2+2+4+4} \\[5pt] &=&1-\frac23-\frac25+\frac15+\frac47+\frac19-\frac29-\frac2{11}+\frac1{13} \\[5pt] &=&\frac{22016}{45045} \\[5pt] &\approx&49\%\;. \end{eqnarray}

Now we can calculate the time required for a full collection from both dice. First, by the standard result for a coupon collection with unequal probabilities, we have to roll die $1$ an expected number

$$ 10\left(\frac21+\frac22-\frac1{1+1}-\frac4{1+2}-\frac1{2+2}+\frac2{1+1+2}+\frac2{1+2+2}-\frac1{1+1+2+2}\right)\\ =10\left(2+1-\frac12-\frac43-\frac14+\frac12+\frac25-\frac16\right) \\=\frac{33}2 $$

of times to collect all coupons specific to that die. Then, with probability $p$ we’re still missing coupon $E$, so we have to roll die $2$ an expected number

$$ 20\left(\frac14+\frac15-\frac1{4+5}\right)=\frac{61}9 $$

of times to collect both $E$ and $F$; whereas with probability $1-p$ we already have coupon $E$ and only have to roll die $2$ an expected $4$ times to collect $F$.

Altogether, the expected number of rolls required for a full collection is

$$ \frac{33}2+\frac{22016}{45045}\cdot\frac{61}9+\left(1-\frac{22016}{45045}\right)\cdot4=\frac{3544481}{162162}\approx21.86\;. $$

Here’s Java code that checks this result by simulation.