Coupon collectors problem: variance calculation missing a term.

coupon-collectorprobabilitysummation

EDIT: for a consolidated answer to the general variance in a coupon collectors problem with unequal probabilities, see here: https://math.stackexchange.com/a/3454032/155881

In example 5.17 of the book on Introduction to probability models by Ross, he solves the coupon collector's problem, where there are $n$ coupons, each with probability $p_j$ of being collected per draw (with $\sum_{j=1}^n p_j=1$). He uses the Poisson process to come up with the following expression for the expected value of $X$, the number of coupons to be collected for completing the collection:

$$E(X) = \int\limits_0^\infty P(X>t)dt = \int\limits_0^\infty \left(1-\prod\limits_{j=1}^n (1-e^{-p_j t})\right)dt$$
Using the fact that $\int_0^\infty e^{-pt}=\frac 1 p$,

$$E(X)=\sum \frac 1 p_j -\sum_{i<j} \frac{1}{p_i+p_j}+\dots +(-1)^{n-1}\frac{1}{p_1+\dots+p_n}$$

Now, I want to use the same approach to calculate the variance. Per comment by @BGM here and also this question, we can use the following expression to get $E(X^2)$:

$$E(X^2) = \int\limits_0^\infty 2tP(X>t)dt = \int\limits_0^\infty 2t\left(1-\prod\limits_{j=1}^n(1-e^{-p_j t})\right)dt$$

Using the fact that $\int\limits_0^\infty te^{-pt}=\frac{1}{p^2}$ and the same algebra as for $E(X)$ we get:

$$\frac{E(X^2)}{2} = \sum \frac {1} {p_j^2} -\sum_{i<j} \frac{1}{(p_i+p_j)^2}+\dots +(-1)^{n-1}\frac{1}{(p_1+\dots+p_n)^2} $$

Now, let's consider the special case where all coupons have an equal probability of being selected. In other words, $p_j=\frac 1 n \; \forall \; j$.

Approach-1
We get:

$$\frac{E(X^2)}{2} = n^2\left(\sum\limits_{k=1}^n (-1)^{k-1}\frac{n\choose k}{k^2}\right)$$

Per my answer to the question here, this summation yields:

$$E(X^2) = 2n^2\left( \sum_{j=1}^n\sum_{k=1}^j\frac{1}{jk}\right)\tag{1}$$

Approach-2
But per this paper, the variance for this special case is:

$$V(X) = n^2\sum_{j=1}^m\frac{1}{j^2}-n\sum_{j=1}^m\frac{1}{j} $$
and this would mean that:

$$E(X^2) = V(X)+E(X)^2 = n^2\sum_{j=1}^m\frac{1}{j^2}-n\sum_{j=1}^m\frac{1}{j}+\left(n\sum_{j=1}^m\frac{1}{j}\right)^2$$

If we visualize a $j-k$ grid, it's easy to see that this is the same as:

$$E(X^2) = 2n^2\left( \sum_{j=1}^n\sum_{k=1}^j\frac{1}{jk}\right)-n\sum_{j=1}^m\frac{1}{j}\tag{2}$$

If we compare equation (1) from approach-1 and equation (2) from approach-2, it's clear that equation (1) has a missing $-n\sum_{j=1}^m\frac{1}{j}$ term. And equation (2) has been verified using other methods. This indicates that there is some small mistake with approach-1 that is making us miss this term. I haven't been able to spot what this issue is. Hoping someone else might.

Best Answer

I eventually figured out this one. Every result in the question above is correct. It's just that the $X$ in equation (1) is the time at which all coupons will be collected if we assume that coupons arrive at a rate $\lambda=1$ in accordance with a Poisson process with each coupon arrival being type $j$ with probability $p_j$. Let $N$ be the number of coupons collected when the collection is completed. Then, we're interested in $E(N^2)$ and that is the expression equation (2) in the question is an expression for. So, we need to relate $E(X^2)$ with $E(N^2)$. First, as Ross notes,

$$E(X|N=n)=nE(T_i)$$

where $T_i$ are the inter-arrival times for coupon arrivals. Since these are assume to be exponential with rate 1,

$$E(X|N)=N\tag{1}$$

Taking expectations on both sides and using the law of total expectation we get:

$$E(X)=E(N)$$

Now, what about variance? Using the law of total variance we get:

$$V(X)=E(V(X|N))+V(E(X|N))$$

So per equation (1) we have:

$$V(X)=E(V(X|N))+V(N)\tag{2}$$

Now,

$$V(X|N)=NV(T_i)$$

And since $T_i \sim Exp(1)$, we have $V(T_i)=1$ meaning, $V(X|N)=N$.

Substituting into (2),

$$V(X)=E(N)+V(N)$$

And this extra $E(N)$ term on the LHS accounts for the missing term in the question.

Related Question