Probability in Dice Rolls – What is the Probability of Rolling the Same Number 5 Times in a Row?

dicemarkov-processprobability

A fair die is rolled 1,000 times. What is the probability of rolling the same number 5 times in a row? How do you solve this type of question for variable number of throws and number of repeats?

Best Answer

Below we compute the probability in four ways:

Computation with Markov Chain          0.473981098314993
Computation with generating function   0.473981098314988
Estimation false method                0.536438013618686
Estimation correct method              0.473304632462677

The first two are exact methods and differ only a little (probably some round of errors), the third method is a naive estimation that does not give the correct number, the fourth method is better and gives a result that is very close to the exact method.

Computationally:

Markov Chain

You can model this computationally with a transition matrix

Say the column vector $X_{k,j} = \lbrace x_1,x_2,x_3,x_4,x_5 \rbrace_{j}$ is the probability to have $k$ of the same numbers in a row in the $j$-th dice roll. Then (when assuming a 6-sided dice)

$$X_{k,j} = M \cdot X_{k,j-1}$$ with

$$M = \begin{bmatrix} \frac{5}{6} & \frac{5}{6} & \frac{5}{6} & \frac{5}{6} & 0 \\ \frac{1}{6} & 0& 0 & 0 & 0 \\ 0& \frac{1}{6} & 0& 0 & 0 \\ 0 & 0& \frac{1}{6} & 0& 0 \\ 0&0 & 0& \frac{1}{6} & 1 \\ \end{bmatrix}$$

where this last entry $M_{5,5} = 1$ relates to 5 of the same in a row being an absorbing state where we 'stop' the experiment.

After the first roll you will be certainly in state 1 (there's certainly only 1 of the same number in a row).

$$X_{k,1} = \lbrace 1,0,0,0,0 \rbrace$$

After the $j$-th roll this will be multiplied with $M$ a $j-1$ times

$$X_{k,j} = M^{j-1} \lbrace 1,0,0,0,0 \rbrace$$

R-Code:

library(matrixcalc) ### allows us to use matrix.power

M <- matrix(c(5/6, 5/6, 5/6, 5/6, 0,
              1/6, 0  , 0  , 0  , 0,
              0,   1/6, 0  , 0  , 0,
              0,   0  , 1/6, 0  , 0,
              0,   0  , 0  , 1/6, 1),
            5, byrow = TRUE)

start <- c(1,0,0,0,0)
matrix.power(M,999) %*% start

The result is $$X_{k,1000} = \begin{bmatrix} 0.438631855\\ 0.073152468\\ 0.012199943\\ 0.002034635\\ \color{red}{0.473981098}\end{bmatrix}$$

and this last entry 0.473981098 is the probability to roll the same number 5 times in a row in 1000 rolls.

generating function

Our question is:

  • How to calculate the probability of rolling any number at least $k$ times in a row, out of $n$ tries?

This is equivalent to the question

  • How to calculate the probability of rolling the number 6 at least $k-1$ times in a row, out of $n-1$ tries?

You can see it as tracking whether the dice roll $m$ is the same number as the number of the dice roll $m-1$ (which has 1/6-th probabilty). And this needs to happen $k-1$ times in a row (in our case 4 times).

In this Q&A the alternative question is solved as a combinatorial problem: How many ways can we roll the dice $n$ times without the number '6' occuring $k$ or more times in a row.

This is found by finding all possible combinations of ways that we can combine the strings 'x', 'x6', 'x66', 'x666' (where 'x' is any number 1,2,3,4,5) into a string of length $n+1$ ($n+1$ instead of $n$ because in this way of constructing strings the first letter is always $x$ here). In this way we counted all possibilities to make a string of length $n$ but with only 1, 2, or 3 times a 6 in a row (and not 4 or more times).

Those combinations can be found by using an equivalent polynomial. This is very similar to the binomial coefficients which relate to the coefficients when we expand the power $(x+y)^n$, but it also relates to a combination.

The polynomial is

$$\begin{array}{rcl} P(x) &=& \sum_{k=0}^\infty (5x+5x^2+5x^3+5x^4)^k\\ &=& \frac{1}{1-(5x+5x^2+5x^3+5x^4)} \\ &=& \frac{1}{1-5\frac{x-x^5}{1-x}}\\ &=& \frac{1-x}{1-6x+5x^5} \end{array}$$

The coefficient of the $x^n$ relates to the number of ways to arrange the numbers 1,2,3,4,5,6 in a string of length $n-1$ without 4 or more 6's in a row. This coefficient can be found by a recursive relation. $$P(x) (1-6x+5x^5) = 1-x$$ which implies that the coefficients follow the relation

$$a_n - 6a_{n-1} + 5 a_{n-5} = 0$$

and the first coefficients can be computed manually

$$a_1,a_2,a_3,a_4,a_5,a_6,a_7 = 5,30,180,1080,6475,38825,232800$$

With this, you can compute $a_{1000}$ and $1-a_{1000}/6^{999}$ will be the probability to roll the same number 5 times in a row 5.

In the R-code below we compute this (and we include a division by 6 inside the recursion because the numbers $a_{1000}$ and $6^{999}$ are too large to compute directly). The result is $0.473981098314988$, the same as the computation with the Markov Chain.

x <- 6/5*c(5/6,30/6^2,180/6^3,1080/6^4,6475/6^5,38825/6^6,232800/6^7)
for (i in 1:1000) {
  t <- tail(x,5)
  x <- c(x,(6/6*t[5]-5/6^5*t[1]))   ### this adds a new number to the back of the vector x
}
1-x[1000]

Analytic/Estimate

Method 1: wrong

You might think, the probability to have in any set of 5 neighboring dices, 5 of the same numbers, is $\frac{1}{6^4} = \frac{1}{1296}$, and since there are 996 sets of 5 neighboring dices the probability to have in at least one of these sets 5 of the same dices is:

$$ 1-(1-\frac{1}{6^4})^{996} \approx 0.536$$

But this is wrong. The reason is that the 996 sets are overlapping and not independent.

Method 2: correct

A better way is to approximate the Markov chain that we computed above. After some time you will get that the occupation of the states, with 1,2,3,4 of the same number in a row, are more or less stable and the ratio's will be roughly $1/6,1/6^2,1/6^3,1/6^4$ (*). Thus the fraction of the time that we have 4 in a row is:

$$\text{frequency 4 in a row} = \frac{1/6^4}{1/6+1/6^2+1/6^3+1/6^4}$$

If we have these 4 in a row then we have a 1/6-th probability to finish the game. So the frequency of finishing the game is

$$\text{finish-rate} = \frac{1}{6} \text{frequency 4 in a row} = \frac{1}{1554}$$

and the probability to be finished after $k$ steps is approximately

$$P_k \approx 1-(1-\frac{1}{1554})^{k-4} \underbrace{\approx 0.47330}_{\text{if $k=1000$}}$$

much closer to the exact computation.


(*) The occupation in state $k$ during roll $j$ will relate to the occupation in state $k-1$ during roll $j-1$. We will have $x_{k,j} = \frac{1}{6} x_{k-1,j-1} \approx \frac{1}{6} x_{k-1,j}$. Note that this requires that you have $x_{k-1,j} \approx x_{k-1,j-1}$, which occurs when the finish-rate is small. If this is not the case, then you could apply a factor to compensate, but the assumption of relatively steady ratio's will be wrong as well.

Related problems

This latter related problem gives a different approximation based on expectation values and estimates the distribution as an overdispersed Poisson distribution. Giving an approximation $1- \exp \left(-(1000-5+1)\left(\frac{1}{6^4}\right) /1.2 \right)\approx 0.4729354$ which isn't bad either.

Related Question