Why is the expected number of draws given $k$ coupons so similar to the expected number of draws required to obtain $k$ coupons

conditional-expectationcoupon-collectorexpected valueprobability

Regarding the coupon collector’s problem, the question Expected size of collection based on number of uniques asks for the expected value of the number $N$ of draws made, given that $k$ out of $m$ unique coupons have been drawn. To make this well-defined, we need a prior for $N$. To answer the question, I assumed a uniform (improper) prior, found the corresponding posterior distribution for $N$ given $k$, and then computed its expected value using formal manipulations of a generating function. I was surprised to find that the result,

$$
\mathsf E[N\mid k]=m\sum_{r=1}^k\frac1{m-r}\;,
$$

is quite simple and very similar to the classical result for the number of draws required to obtain $k$ unique coupons,

$$
m\sum_{r=0}^{k-1}\frac1{m-r}\;,
$$

with the index merely shifted by $1$ and the simple difference $m\left(\frac1{m-k}-\frac1m\right)=\frac k{m-k}$.

It seems there must be a better explanation for this than formal manipulations of a generating function – perhaps something along the lines of the classical argument with geometric distributions that leads to the classical result – but I can’t think of one. Can you?

The mystery deepens a bit if you try to heuristically derive this result from the classical one. If we denote by $T_k$ the number of draws required to draw $k$ unique coupons, then we have $k$ unique coupons from $T_k$ until $T_{k+1}-1$. One might have thought that $E[N\mid k]$ would be something like the midpoint between the expected values of those two points, but it turns out to be expected value of the endpoint: $E[N\mid k]=E[T_{k+1}-1]$ (still assuming a uniform prior for $N$).

Best Answer

I figured this out now. The probability $p_n$ that $n$ is the last number of draws after which we have $k$ unique coupons is $\frac{m-k}m$ times the probability $q_n$ that we have $k$ unique coupons after $n$ draws (since this will be the last time iff we draw one of the unseen $m-k$ coupons). Since the $p_n$ are a probability distribution over $n$ and the $q_n$ are a constant multiple of them, normalizing the $q_n$ to obtain the posterior distribution of $N$ given $k$ yields the $p_n$. Thus the expected number of draws given $k$ unique coupons (assuming a uniform prior for $N$) is the expected value of the last number of draws after which we have $k$ unique coupons, which is $1$ less than the well-known expected number of draws required to get $k+1$ unique coupons, that is,

$$ \mathsf E[N\mid k]=\left(m\sum_{r=0}^k\frac1{m-r}\right)-1=m\sum_{r=1}^k\frac1{m-r}\;. $$