We roll a die until we obtain all numbers from $1$ to $6$. I found the expected value of rolls computing it like $X = X_1 + \dots + X_6$ where $X_i$ is number of rolls needed to obtain a result different from previous $i-1$ and using a geometric distribution. And my result is correct. But then I wanted to find a variance. Firstly I thought of doing it this way: $$\text{Var} (X_1 + \dots + X_6) = \text{Var}(X_1) + \dots + \text{Var}(X_6) + 2 \sum_{1\le i<j\le6} \text{Cov}(X_i, X_j),$$ but covariance is not easy to find here. Can somebody please show me how to find a variance of number of dice rolls?
Probability – Variance of Rolling a Die Until All Numbers Appear
coupon-collectordiceprobability
Related Solutions
Here's a heavy-handed approach. After zero or more rolls, you are in one of four situations:
$\begin{align} \emptyset:&\qquad\textrm{No 5 or even rolled yet.}\\ E:&\qquad\textrm{Even was rolled, but no 5 yet.}\\ 5:&\qquad\textrm{A 5 was rolled, but no even yet.}\\ *:&\qquad\textrm{Both 5 and even have been rolled. Game over.}\\ \end{align}$
The transition matrix of probabilities between each pair of situations is easy to compute:
$\begin{array}{l|cccc} \nearrow&\emptyset&E&5&*\\ \hline \emptyset&\frac{1}{3}&\frac{1}{2}&\frac{1}{6}&0\\ E&0&\frac{5}{6}&0&\frac{1}{6}\\ 5&0&0&\frac{1}{2}&\frac{1}{2}\\ *&0&0&0&1\\ \end{array}$
So this is now modeled as a absorbing Markov chain with transition matrix
$\left({\begin{array}{cccc} \frac{1}{3}&\frac{1}{2}&\frac{1}{6}&0\\ 0&\frac{5}{6}&0&\frac{1}{6}\\ 0&0&\frac{1}{2}&\frac{1}{2}\\ 0&0&0&1\\ \end{array}}\right)$
The final state being listed last, the behavior is characterized by the $3\times3$ matrix in the upper left, which is the transition matrix for the non-final states.
$Q=\left({\begin{array}{ccc} \frac{1}{3}&\frac{1}{2}&\frac{1}{6}\\ 0&\frac{5}{6}&0\\ 0&0&\frac{1}{2}\\ \end{array}}\right)$
The so-called fundamental matrix $N$ for this chain is
$N=(I-Q)^{-1} =\left({\begin{array}{ccc} \frac{3}{2}&\frac{9}{2}&\frac{1}{2}\\ 0&6&0\\ 0&0&2\\ \end{array}}\right) $.
The expected number of steps from the $i$-th state to the final one is the sum of the entries of the $i$-th row of $N$, or equivalently the $i$-th entry of the matrix
${\bf t}=N\mathbb{1}=\left({\begin{array}{c} \frac{13}{2}\\ 6\\ 2\\ \end{array}}\right)$,
so for the starting state $\emptyset$, it's $\frac{13}{2}$ steps.
The variance of the number of steps from the $i$-th state is the $i$-th entry in the matrix
$(2N-I){\bf t-t_{\textrm sq}}$,
where $t_{\textrm sq}$ is the matrix $\bf t$ with each entry squared. If I didn't slip up with Mathematica,
$(2N-I){\bf t-t_{\textrm sq}}=\left({\begin{array}{c} \frac{107}{4}\\ 30\\ 2\\ \end{array}}\right)$,
and the variance you want is $\frac{107}{4}$
Somewhere you stopped applying $\frac{1}{n^2}$ to the right-hand side of your expressions
You should have ended up with $$\text{Var}(\bar{X}) =\frac{1}{n^2}\left(n\text{Var}(X)+2 \frac{n(n-1)}{2} \rho\text{Var}(X)\right) =\frac{1+(n-1)\rho}{n} \text{Var}({X})$$
which, as expected, is $\frac{1}{n} \text{Var}({X})$ when $\rho=0$, and is $ \text{Var}({X})$ when $\rho=1$
Best Answer
Just to give an explicit answer so as to point a later question here:
The variance for the coupon collector's problem of collecting all $n$ distinct and equally likely coupons is simply the sum of the $n$ different geometric distributions variances so is $$\sum_{k=1}^n \left(\left(\frac n k\right)^2 - \frac n k\right)$$
With $n=6$ this variance is exactly $38.99$ so the standard deviation is about $6.2441973$, which seems quite large when you consider that the expectation is $14.7$. Of the $38.99$, the part of the variance associated with collecting the final coupon is $30$.
An approximation for the variance is $$\frac{\pi^2}{6}n^2 - (\log_e(n)+1+\gamma)n - \frac{1}{12 n}$$ where $\gamma \approx 0.5772156649$ is the Euler-Mascheroni constant. With $n=6$ this is about $38.9898867$