[Math] Expected value of the number of distinct results of die rolls in $N$ trials

diceprobabilitystatisticsvariance

Given $N$ trials of a die roll, where we have defined $D$ as the number of distinct outcomes, what would be the mean and standard deviation of $D$?

If we have defined $I(k)$ as an indicator random variable which equals 1 if outcome $k$ (such as 6) appears at least once, and 0 otherwise, for $k\in\{ 1,\dots,6\}$, then by definition
$$D = \sum\limits_{k=1}^6 I(k)$$
How do the dependencies between the $I(k)$ play into the solution? (Which is the part that is tripping me up the most.)

Best Answer

We approach this problem from a combinatorial perspective. The number of $n$-roll sequences with $k$ distinct values ($1\le k\le6$), out of $6^n$ sequences total, is $$D(n,k)=\binom6kk!\left\{n\atop k\right\}=\binom6k\sum_{j=0}^k(-1)^{k-j}\binom kjj^n$$ where $\left\{n\atop k\right\}$ is the Stirling number of the second kind and counts the number of ways to partition the rolls into homogeneous subsets, and $\binom6kk!$ is the number of ways to fill those subsets with dice rolls. Letting $n$ vary across the positive integers we get $$D(n,1)=6$$ $$D(n,2)=15\cdot2^n-30$$ $$D(n,3)=-60\cdot2^n+20\cdot3^n+60$$ $$D(n,4)=90\cdot2^n-60\cdot3^n+15\cdot4^n-60$$ $$D(n,5)=-60\cdot2^n+60\cdot3^n-30\cdot4^n+6\cdot5^n+30$$ $$D(n,6)=15\cdot2^n-20\cdot3^n+15\cdot4^n-6\cdot5^n+6^n-6$$ $\frac{D(n,k)}{6^n}$ then gives the probability an $n$-roll sequence will have $k$ distinct values. The expected value of $D$ for a given $n$ is then $$\mu_n=\sum_{k=1}^6k\cdot\frac{D(n,k)}{6^n}=6\left(1-\left(\frac56\right)^n\right)$$ and the standard deviation is $$\sigma_n=\sqrt{\sum_{k=1}^6\frac{D(n,k)}{6^n}(k-\mu_n)^2}=\sqrt{\frac{5\cdot144^n-6\cdot150^n+180^n}{6^{3n-1}}}$$ Note that $$\lim_{n\to\infty}\mu_n=6\text{ and }\lim_{n\to\infty}\sigma_n=0$$ which match our intuition, since for large $n$ almost all roll sequences should contain all six outcomes.

The SymPy code that generated these results can be found here.