Coupon Collecting Problem: Condition on the order of coupons you see

conditional-expectationcoupon-collectorexpected valueprobability

There are four types of coupons (call them types $1,2,3,4$) with $p_1 = p_2 = 1/8$ and $p_3 = p_4 = 3/8$. You collect coupons until you have a full set.

Q1: What is the probability that after disregarding duplicates, the order of coupons that you collect is $3241$? For example, let's say it takes you $10$ tries to get a full set and you obtain $3333324421$ which means the order of coupons collected was $3241$.

Q2: Given that the order you observe will be $3241$ (with possible duplicates), what is the expected number of coupons needed to collect a full set?

Q3: Let $X$ be the number of coupons needed to collect a full set. Is it true that $E[X] = E[X \mid 3241]P(3241) + E[X \mid 1234]P(1234) +…$ where you condition $X$ on all $4!$ possible ways that you will observe the order of new coupons that appear?


My attempt

Q1: $P(3241) = P(3)P(2 \mid 3)P(4 \mid 32)P(1 \mid 324) = \frac{3}{8}\frac{1}{5}\frac{3}{4}\frac{1}{1} = 0.05625$

Q2: $E[X \mid 3241] = 1+\frac{3+1}{1}+\frac{3+1+3}{3}+\frac{1+3+3+1}{1} = 15.33$ which is leveraging the EV of a geometric RV.

Q3: I believe yes… but when I put it in python I don't get the right answer. It could be because of my crappy scripting skills. Above in my attempt I solved for the case $3241$ and my output below matches it.

(1, 2, 3, 4) Condl EV: 7.33 | Prob: 0.0089 | EV*Prob: 0.06 | Cum Sum: 0.06

(1, 2, 4, 3) Condl EV: 7.33 | Prob: 0.0089 | EV*Prob: 0.06 | Cum Sum: 0.13

(1, 3, 2, 4) Condl EV: 10.0 | Prob: 0.0133 | EV*Prob: 0.13 | Cum Sum: 0.26

(1, 3, 4, 2) Condl EV: 12.6 | Prob: 0.0401 | EV*Prob: 0.50 | Cum Sum: 0.77

(1, 4, 2, 3) Condl EV: 10.0 | Prob: 0.0133 | EV*Prob: 0.13 | Cum Sum: 0.90

(1, 4, 3, 2) Condl EV: 12.6 | Prob: 0.0401 | EV*Prob: 0.50 | Cum Sum: 1.41

(2, 1, 3, 4) Condl EV: 7.33 | Prob: 0.0089 | EV*Prob: 0.06 | Cum Sum: 1.48

(2, 1, 4, 3) Condl EV: 7.33 | Prob: 0.0089 | EV*Prob: 0.06 | Cum Sum: 1.54

(2, 3, 1, 4) Condl EV: 10.0 | Prob: 0.0133 | EV*Prob: 0.13 | Cum Sum: 1.68

(2, 3, 4, 1) Condl EV: 12.6 | Prob: 0.0401 | EV*Prob: 0.50 | Cum Sum: 2.19

(2, 4, 1, 3) Condl EV: 10.0 | Prob: 0.0133 | EV*Prob: 0.13 | Cum Sum: 2.32

(2, 4, 3, 1) Condl EV: 12.6 | Prob: 0.0401 | EV*Prob: 0.50 | Cum Sum: 2.83

(3, 1, 2, 4) Condl EV: 12.6 | Prob: 0.0187 | EV*Prob: 0.23 | Cum Sum: 3.07

(3, 1, 4, 2) Condl EV: 15.3 | Prob: 0.0562 | EV*Prob: 0.86 | Cum Sum: 3.93

(3, 2, 1, 4) Condl EV: 12.6 | Prob: 0.0187 | EV*Prob: 0.23 | Cum Sum: 4.17

(3, 2, 4, 1) Condl EV: 15.3 | Prob: 0.0562 | EV*Prob: 0.86 | Cum Sum: 5.03

(3, 4, 1, 2) Condl EV: 18.0 | Prob: 0.1124 | EV*Prob: 2.02 | Cum Sum: 7.05

(3, 4, 2, 1) Condl EV: 18.0 | Prob: 0.1124 | EV*Prob: 2.02 | Cum Sum: 9.08

(4, 1, 2, 3) Condl EV: 12.6 | Prob: 0.0187 | EV*Prob: 0.23 | Cum Sum: 9.32

(4, 1, 3, 2) Condl EV: 15.3 | Prob: 0.0562 | EV*Prob: 0.86 | Cum Sum: 10.1

(4, 2, 1, 3) Condl EV: 12.6 | Prob: 0.0187 | EV*Prob: 0.23 | Cum Sum: 10.4

(4, 2, 3, 1) Condl EV: 15.3 | Prob: 0.0562 | EV*Prob: 0.86 | Cum Sum: 11.2

(4, 3, 1, 2) Condl EV: 18.0 | Prob: 0.1124 | EV*Prob: 2.02 | Cum Sum: 13.3

(4, 3, 2, 1) Condl EV: 18.0 | Prob: 0.1124 | EV*Prob: 2.02 | Cum Sum: 15.3

I get $15.3$ but the true answer is $437/35 \approx 12.48$.

Best Answer

Your calculation of the conditional expectations is wrong – though I must admit it seemed plausible to me at first and I had to write a simulation to pinpoint the mistake.

To see why it can't be right, consider three types of coupons with probabilities $p_1\gg p_2\gg p_3$ arriving in the order $231$. The way you're calculating the conditional expectation for the $3$ to appear, you're ignoring coupons of type $1$ and assuming that as long as you know that $3$ is next to appear, you can treat the process as if only types $2$ and $3$ existed, leading to the expectation $\frac{p_2+p_3}{p_3}$. But that's a large number, whereas if $3$ comes before $1$ it's most likely to come right away, so the expectation should be close to $1$.

Once $2$ has appeared, the expected time to see the next new type is $\frac1{1-p_2}=\frac1{p_1+p_3}$, and this is independent of whether that next new type happens to be $1$ or $3$: If you calculate the conditional expectation the pedestrian way as

$$ \frac{\sum_nnp_2^np_3}{\sum_np_2^np_3}\;, $$

$p_3$ drops out. So

$$ \mathsf E[X\mid3241]=1+\frac85+2+8=12.6\;, $$

and if you do them all like that, the answer comes out right.

Related Question