As well-known the expected number of coupons $N$ required to collect the complete set of $n$ coupons in the general case of non-uniform probability distribution can be computed as:
$$
\mathbb E (N)=\sum_i\frac1{p_i}-\sum_{(i,j)}\frac1{p_i+p_j}+\sum_{(i,j,k)}\frac1{p_i+p_j+p_k}-\cdots-\frac{(-1)^n}{\underbrace{p_1+p_2+\cdots+p_n}_{=1}},\tag1
$$
where $p_i$ is the probability to obtain $i$-th coupon.
My question is: can this pattern be explained on the basis of the inclusion-exclusion principle? It really strikes that the first term $\frac1{p_i}$ is nothing else as the expected number of coupons till obtaining the $i$-th coupon, next term $\frac1{p_i+p_j}$ is the expected number of coupons till obtaining the $i$-th or the $j$-th coupon and so on.
I am looking for intuitively understandable explanation which would be consistent with the above interpretation of the separate terms.
I would appreciate any hint.
Best Answer
Yes, it is exactly inclusion exclusion. For each $t\in \mathbb N_0$ and $k\in [n]$, let
Then $E_t=\bigcup_{k=1}^n E_{k,t}$, so inclusion-exclusion applies. It all comes together with these three pieces: $$ E[N]=\sum_{t\ge 0}P(E_t),\\ P(E_{t})=\sum_{S\subseteq [n]} (-1)^{|S|+1}P\left(\bigcap_{k\in S} E_{k,t}\right),\\ P\left(\bigcap_{k\in S} E_{k,t}\right)=\left(1-\sum_{k\in S}p_k\right)^t. $$ To be clear, $S$ ranges over only nonempty subsets of $[n]$.