[Math] Coupon Collector Problem (Almost)

coupon-collector

Here's a variation of the coupon collector problem from Ross that I'm stuck on:

There are $n$ types of coupons. Each newly obtained coupon is, independently, type $i$ with probability $p_i, i = 1,…,n$. Find the expected number and the variance of the number of distinct types obtained in a collection of k coupons.

I thought maybe I would start with an indicator:

Let $X$ be the total number of distinct coupons and $X_i$ the probability that $i^{th}$ is a new coupon. Then $\sum_{i=1}^{n} X_i = X$. Each indicator is 1 w.p. $x/k$. I don't think this is correct though.

Best Answer

The solution for the expectation has already been given in the comments. The calculation is slightly simplified by considering the number $Y=n-X$ of coupon types not collected, with $\mathbb E[X]=n-\mathbb E[Y]$ and $\operatorname{Var}(X)=\operatorname{Var}(Y)$. Let $Y_i$ denote the indicator variable of the event that coupon type $i$ is not obtained. Its expectation is the probability $(1-p_i)^k$ of not obtaining a coupon of type $i$, so the expected number of coupons not obtained is

$$\mathbb E[Y]=\mathbb E\left[\sum_iY_i\right]=\sum_i(1-p_i)^k\;.$$

The variance is calculated analgously by expressing it in terms of expectations:

\begin{align} \operatorname{Var(Y)}&=\mathbb E\left[\left(\sum_iY_i\right)^2\right]-\mathbb E\left[\sum_iY_i\right]^2\\ &=\sum_{i,j}\mathbb E[Y_iY_j]-\mathbb E[Y_i]\mathbb E[Y_j]\\ &=\sum_i(1-p_i)^k\left(1-(1-p_i)^k\right)+\sum_{i\ne j}\left((1-p_i-p_j)^k-(1-p_i)^k(1-p_j)^k\right)\;. \end{align}

Related Question