Intuition behind the coupon collector problem. Is there inclusion-exclusion principle in play

combinatoricscoupon-collectorexpected valueinclusion-exclusionprobability

As well-known the expected number of coupons $N$ required to collect the complete set of $n$ coupons in the general case of non-uniform probability distribution can be computed as:
$$
\mathbb E (N)=\sum_i\frac1{p_i}-\sum_{(i,j)}\frac1{p_i+p_j}+\sum_{(i,j,k)}\frac1{p_i+p_j+p_k}-\cdots-\frac{(-1)^n}{\underbrace{p_1+p_2+\cdots+p_n}_{=1}},\tag1
$$

where $p_i$ is the probability to obtain $i$-th coupon.

My question is: can this pattern be explained on the basis of the inclusion-exclusion principle? It really strikes that the first term $\frac1{p_i}$ is nothing else as the expected number of coupons till obtaining the $i$-th coupon, next term $\frac1{p_i+p_j}$ is the expected number of coupons till obtaining the $i$-th or the $j$-th coupon and so on.

I am looking for intuitively understandable explanation which would be consistent with the above interpretation of the separate terms.

I would appreciate any hint.

Best Answer

Yes, it is exactly inclusion exclusion. For each $t\in \mathbb N_0$ and $k\in [n]$, let

  • $E_t$ be the event that not all the coupons have been collected after $t$ coupons have been drawn,
  • $E_{k,t}$ be the event that coupon $k$ does not appear in the first $t$ coupons.

Then $E_t=\bigcup_{k=1}^n E_{k,t}$, so inclusion-exclusion applies. It all comes together with these three pieces: $$ E[N]=\sum_{t\ge 0}P(E_t),\\ P(E_{t})=\sum_{S\subseteq [n]} (-1)^{|S|+1}P\left(\bigcap_{k\in S} E_{k,t}\right),\\ P\left(\bigcap_{k\in S} E_{k,t}\right)=\left(1-\sum_{k\in S}p_k\right)^t. $$ To be clear, $S$ ranges over only nonempty subsets of $[n]$.