Probability – Rolling a 6-sided Die to Obtain Every Number

coupon-collector-problemdiceprobability

I've just played a game with my kids that basically boils down to: whoever rolls every number at least once on a 6-sided die wins.

I won, eventually, and the others finished 1-2 turns later. Now I'm wondering: what is the expectation of the length of the game?

I know that the expectation of the number of rolls till you hit a specific number is
$\sum_{n=1}^\infty n\frac{1}{6}(\frac{5}{6})^{n-1}=6$.

However, I have two questions:

  1. How many times to you have to roll a six-sided die until you get every number at least once?
  2. Among four independent trials (i.e. with four players), what is the expectation of the maximum number of rolls needed? [note: it's maximum, not minimum, because at their age, it's more about finishing than about getting there first for my kids]

I can simulate the result, but I wonder how I would go about calculating it analytically.


Here's a Monte Carlo simulation in Matlab

mx=zeros(1000000,1);
for i=1:1000000,
   %# assume it's never going to take us >100 rolls
   r=randi(6,100,1);
   %# since R2013a, unique returns the first occurrence
   %# for earlier versions, take the minimum of x
   %# and subtract it from the total array length
   [~,x]=unique(r); 
   mx(i,1)=max(x);
end

%# make sure we haven't violated an assumption
assert(numel(x)==6)

%# find the expected value for the coupon collector problem
expectationForOneRun = mean(mx)

%# find the expected number of rolls as a maximum of four independent players
maxExpectationForFourRuns = mean( max( reshape( mx, 4, []), [], 1) )

expectationForOneRun =
   14.7014 (SEM 0.006)

maxExpectationForFourRuns =
   21.4815 (SEM 0.01)

Best Answer

Because a "completely analytical approach" has been requested, here is an exact solution. It also provides an alternative approach to solving the question at Probability to draw a black ball in a set of black and white balls with mixed replacement conditions.


The number of moves in the game, $X$, can be modeled as the sum of six independent realizations of Geometric$(p)$ variables with probabilities $p=1, 5/6, 4/6, 3/6, 2/6, 1/6$, each of them shifted by $1$ (because a geometric variable counts only the rolls preceding a success and we must also count the rolls on which successes were observed). By computing with the geometric distribution, we will therefore obtain answers that are $6$ less than the desired ones and therefore must be sure to add $6$ back at the end.

The probability generating function (pgf) of such a geometric variable with parameter $p$ is

$$f(z, p) = \frac{p}{1-(1-p)z}.$$

Therefore the pgf for the sum of these six variables is

$$g(z) = \prod_{i=1}^6 f(z, i/6) = 6^{-z-4} \left(-5\ 2^{z+5}+10\ 3^{z+4}-5\ 4^{z+4}+5^{z+4}+5\right).$$

(The product can be computed in this closed form by separating it into five terms via partial fractions.)

The cumulative distribution function (CDF) is obtained from the partial sums of $g$ (as a power series in $z$), which amounts to summing geometric series, and is given by

$$F(z) = 6^{-z-4} \left(-(1)\ 1^{z+4} + (5)\ 2^{z+4}-(10)\ 3^{z+4}+(10)\ 4^{z+4}-(5)\ 5^{z+4}+(1)\ 6^{z+4}\right).$$

(I have written this expression in a form that suggests an alternate derivation via the Principle of Inclusion-Exclusion.)

From this we obtain the expected number of moves in the game (answering the first question) as

$$\mathbb{E}(6+X) = 6+\sum_{i=1}^\infty \left(1-F(i)\right) = \frac{147}{10}.$$

The CDF of the maximum of $m$ independent versions of $X$ is $F(z)^m$ (and from this we can, in principle, answer any probability questions about the maximum we like, such as what is its variance, what is its 99th percentile, and so on). With $m=4$ we obtain an expectation of

$$ 6+\sum_{i=1}^\infty \left(1-F(i)^4\right) \approx 21.4820363\ldots.$$

(The value is a rational fraction which, in reduced form, has a 71-digit denominator.) The standard deviation is $6.77108\ldots.$ Here is a plot of the probability mass function of the maximum for four players (it has been shifted by $6$ already):

Figure

As one would expect, it is positively skewed. The mode is at $18$ rolls. It is rare that the last person to finish will take more than $50$ rolls (it is about $0.3\%$).

Related Question