A fair die is rolled 10 times. Define N to be the number of distinct outcomes. For example, if we roll (3, 2, 2, 1, 4, 3, 1, 6, 2, 3) then N = 5 (the distinct outcomes being 1, 2, 3, 4 and 6). Find the mean and standard deviation of N.
Hint: Define $I_j =
\left\{
\begin{array}{ll}
1 & \text{if outcome j appears at least once} \\
0 & \text{otherwise}
\end{array}
\right.$
and $j=1,…,6$. Then $N=\sum_{j=1}^{6} I_j$ where $I_1,…I_6$ are dependent.
Any help would be much appreciated!
Best Answer
Supposing that the fair die has $n$ sides and is rolled $m$ times we get for the probability of $k$ distinct outcomes by first principles the closed form
$$\frac{1}{n^m} \times m! [z^m] {n\choose k} (\exp(z)-1)^k.$$
Let us verify that this is a probability distribution. We obtain
$$\frac{1}{n^m} \times m! [z^m] \sum_{k=0}^n {n\choose k} (\exp(z)-1)^k.$$
Here we have included the value for $k=0$ because it is a constant with respect to $z$ and hence does not contribute to $[z^m]$ where $m\ge 1.$ Continuing we find
$$\frac{1}{n^m} \times m! [z^m] \exp(nz) = 1$$
and the sanity check goes through. Now for the expectation we get
$$\mathrm{E}[N] = \frac{1}{n^m} \times m! [z^m] \sum_{k=1}^n k {n\choose k} (\exp(z)-1)^k \\ = \frac{n}{n^m} \times m! [z^m] \sum_{k=1}^n {n-1\choose k-1} (\exp(z)-1)^k \\ = \frac{n}{n^m} \times m! [z^m] (\exp(z)-1) \sum_{k=0}^{n-1} {n-1\choose k} (\exp(z)-1)^k \\ = \frac{n}{n^m} \times m! [z^m] (\exp(z)-1) \exp((n-1)z).$$
This is
$$\frac{n}{n^m} \times (n^m - (n-1)^m) = n \left(1-\left(1-\frac{1}{n}\right)^m\right).$$
This goes to $n$ in $m$ as it ought to. For the next factorial moment we obtain
$$\mathrm{E}[N(N-1)] = \frac{1}{n^m} \times m! [z^m] \sum_{k=2}^n k (k-1) {n\choose k} (\exp(z)-1)^k \\ = \frac{n(n-1)}{n^m} \times m! [z^m] \sum_{k=2}^n {n-2\choose k-2} (\exp(z)-1)^k \\ = \frac{n(n-1)}{n^m} \times m! [z^m] (\exp(z)-1)^2 \sum_{k=0}^{n-2} {n-2\choose k} (\exp(z)-1)^k \\ = \frac{n(n-1)}{n^m} \times m! [z^m] (\exp(z)-1)^2 \exp((n-2)z).$$
This is
$$\frac{n(n-1)}{n^m} (n^m - 2(n-1)^m + (n-2)^m) \\ = n(n-1) \left(1-2\left(1-\frac{1}{n}\right)^m + \left(1-\frac{2}{n}\right)^m \right).$$
We get for the variance
$$\mathrm{Var}(N) = \mathrm{E}[N^2] - \mathrm{E}[N]^2 = \mathrm{E}[N(N-1)] + \mathrm{E}[N] - \mathrm{E}[N]^2$$
which yields
$$n^2 \left(1-2\left(1-\frac{1}{n}\right)^m + \left(1-\frac{2}{n}\right)^m \right) \\ - n \left(1-2\left(1-\frac{1}{n}\right)^m + \left(1-\frac{2}{n}\right)^m \right) + n \left(1-\left(1-\frac{1}{n}\right)^m\right) \\ - n^2 \left(1-2\left(1-\frac{1}{n}\right)^m + \left(1-\frac{1}{n}\right)^{2m} \right) \\ = n^2 \left(\left(1-\frac{2}{n}\right)^m - \left(1-\frac{1}{n}\right)^{2m} \right) + n \left(\left(1-\frac{1}{n}\right)^m - \left(1-\frac{2}{n}\right)^m \right).$$
The absolute value of the two differences of terms that are geometric in $m$ with positive common ratio less than one is bounded by the maximum of these two, which vanishes in $m$ and hence the variance goes to zero for $n$ fixed and $m$ going to infinity.
The queried standard deviation is then given by the root of the variance.
Here is some Maple code to explore these numbers.
Addendum. As pointed out by @JMoravitz we may need some clarification of the reasoning by which the probabilities are obtained. Note that $k$ distinct outcomes first of all require a choice of these $k$ values from the $n$ possibilities, for a factor of ${n\choose k}.$ Now we need to match up each of these $k$ values (think of them as listed sequentially from smallest to largest) with a subset of the $m$ rolls which may not be empty (this is what prevents double counting here). This yields the labeled combinatorial class (labeled means EGFs)
$$\def\textsc#1{\dosc#1\csod} \def\dosc#1#2\csod{{\rm #1{\small #2}}} \textsc{SEQ}_{=k}(\textsc{SET}_{\ge 1}(\mathcal{Z}))$$
or alternatively (compare to Stirling numbers of the second kind)
$$\textsc{SEQ}_{=k}(\textsc{SET}_{=1}(\mathcal{Z}) + \textsc{SET}_{=2}(\mathcal{Z}) + \textsc{SET}_{=3}(\mathcal{Z}) + \cdots).$$
We get the generating function
$$\left(z+\frac{z^2}{2!}+\frac{z^3}{3!}+\frac{z^4}{4!}+\cdots\right)^k = (\exp(z)-1)^k$$
and may then continue as before.
Addendum II. Another combinatorial class we can use here is
$$\textsc{SEQ}_{=n}(\mathcal{E} + \mathcal{U}\times\textsc{SET}_{\ge 1}(\mathcal{Z})).$$
This gives the EGF
$$G(z, u) = (1+u(\exp(z)-1))^n = \sum_{k=0}^n {n\choose k} u^k (\exp(z)-1)^k.$$
On evaluating
$$\left. \frac{\partial}{\partial u} G(z, u) \right|_{u=1}$$
using the second form we get the same sum as before. We can also give an alternate evaluation
$$\frac{1}{n^m} \times m! [z^m] \left. \frac{\partial}{\partial u} G(z, u) \right|_{u=1} = \frac{n}{n^m} \times m! [z^m] \exp((n-1)z) (\exp(z) - 1) \\ = \frac{n}{n^m} \times m! [z^m] (\exp(nz) - \exp((n-1)z)) = \frac{n}{n^m} (n^m - (n-1)^m) \\ = n \left(1-\left(1-\frac{1}{n}\right)^m\right).$$
For the next factorial moment we need another derivative which is
$$n(n-1)(1+u(\exp(z)-1))^{n-2} (\exp(z)-1)^2.$$
Set $u=1$ to get
$$n(n-1) \exp((n-2)z)(\exp(2z)-2\exp(z)+1).$$
Extracting coefficients now yields
$$\frac{n(n-1)}{n^m} \left(n^m - 2 (n-1)^m + (n-2)^m\right)$$
and the rest is as in the first version. Note that we get a faster enumeration routine if we admit the classification from the introduction. The result is midway between enumeration and the closed form and is shown below.