Number of trials rolling six 6-sided dice to get 6 unique values

diceexpected valueprobability

I have six 6-sided dice, and I want to roll them until I have all six distinct values (assume them to be naturals 1-6), in no particular order.

The naive strategy would be to roll all 6 dice until I have all 6 values. The expected number of trials until success is $\frac{6^6}{6!} = 64.8$.

A smarter strategy would be to roll a die until its value is distinct from the values of other dice already rolled. Eg. the first die gets rolled once, its value is 3, and then I roll the second one until its value is something else than 3, say 4. Then I roll the 3rd one until its value is neither 3 nor 4, etc. This way, the expected number of trials is $\sum_{i=1}^{6}\frac{6}{i} = 14.7$.

An even smarter strategy would be to first roll all 6 dice, then pick the duplicates ‒ all except one ‒ and roll these again. Repeat untill I have all 6 values. For example:

  • first roll: 1 1 2 4 4 4 (the order doesn't matter)
  • pick 1 4 4, leave 1 2 4 on the table
  • roll the 3 dice (which had values 1 4 4) again
  • repeat this until there are values 1-6 on the table

Question: what is the expected number of trials with this strategy?

Best Answer

As I outlined in a comment, we denote by $t_k, 1 \leq k \leq 6$, the expected number of further rolls required to get six distinct numbers when there are currently $k$ distinct numbers. So, for example, $t_6 = 0$, because there are already six distinct numbers.

If there are currently $k = 5$ distinct numbers, we pick up the duplicate and reroll it. There are two possibilities:

  • The reroll is also a duplicate, with probability $\frac56$.
  • The reroll is the last unrolled number, with probability $\frac16$.

This allows us to write the recurrence

$$ t_5 = 1 + \frac56 t_5 $$

which can be solved to yield $t_5 = 6$. If there are $k = 4$ distinct numbers, we pick up the two duplicates and reroll them. There are four possibilities:

  • Both rerolls are still duplicates, with probability $\left(\frac23\right)^2 = \frac49$.
  • One reroll is a duplicate, but the other is a new number, with probability $2\left(\frac23\right)\left(\frac13\right) = \frac49$.
  • Both rerolls are the same new number, with probability $2\left(\frac16\right)^2 = \frac{1}{18}$.
  • The rerolls are the last two numbers, also with probability $2\left(\frac16\right)^2 = \frac{1}{18}$.

The second and third cases both yield five distinct numbers, so we can write the recurrence

$$ t_4 = 1 + \frac49 t_4 + \left(\frac49+\frac{1}{18}\right) t_5 $$

Plugging in $t_5 = 6$ reduces this to

$$ t_4 = 4 + \frac49 t_4 $$

which yields $t_4 = \frac{36}{5}$. In general, we may write

$$ t_k = 1 + \sum_{j=k}^5 p_{kj} t_j $$

where $p_{kj}$ is the probability of going from $k$ distinct numbers to $j \geq k$ distinct numbers in a single roll. There's probably an explicit summation form for this, but I'm afraid I'm too lazy to think of it at the present time. At any rate, we can continue along in the same vein to write

\begin{align} t_3 & = 1 + \frac18 t_3 + \frac{37}{72} t_4 + \frac13 t_5 \\ & = 1 + \frac18 t_3 + \frac{37}{10} + 2 \\ & = \frac{67}{10} + \frac18 t_3 \end{align}

yielding $t_3 = \frac{268}{35}$, then

\begin{align} t_2 & = 1 + \frac{1}{81} t_2 + \frac{65}{324} t_3 + \frac{55}{108} t_4 + \frac{7}{27} t_5 \\ & = 1 + \frac{1}{81} t_2 + \frac{871}{567} + \frac{11}{3} + \frac{14}{9} \\ & = \frac{4399}{567} + \frac{1}{81} t_2 \end{align}

yielding $t_2 = \frac{4399}{560}$, then

\begin{align} t_1 & = 1 + \frac{1}{7776} t_1 + \frac{155}{7776} t_2 + \frac{25}{108} t_3 + \frac{325}{648} t_4 + \frac{25}{108} t_5 \\ & = 1 + \frac{1}{7776} t_1 + \frac{136369}{870912} + \frac{335}{189} + \frac{65}{18} + \frac{25}{18} \\ & = \frac{986503}{124416} + \frac{1}{7776} t_1 \end{align}

yielding $t_1 = \frac{986503}{124400}$. (Thanks to @user in the comments for noticing an error in my original computation!) Finally, we observe that if $k = 1$ (that is, if you only have one distinct number), you're essentially right where you started, so the overall expected number of rolls until you get six distinct numbers is

$$ t = t_1 = \frac{986503}{124400} \approx 7.93009 $$

There may be a simpler and cleverer way to this answer.

Related Question