Expected number of points won per round in dice game for player who wins

expected valueprobabilityrandom variables

Let's play a dice game where each of $2$ players rolls one 6-sided dice each turn. The player who rolls the highest each turn gets paid what they rolled subtracted by what the other player rolled e.g. if Player $1$ rolls a $6$ and Player $2$ rolls a $2$, then Player $1$ earns $6 – 2 = 4$ points. No points for anyone in the case of a tie. The game ends after a player hits $20$ points (who wins the game), or $10$ rounds have passed. If nobody has hit $20$ points after $10$ rounds, then the game simply ends with no winner.

Question. How many points per round on average does the winning player earn?

My thoughts. Instinctively, the answer should be $20$ divided by the number of rounds played. But that ignores that we could end on $20$, $21$, $22$, $23$, $24$. So we may have to work backwards from those $6$ results and tally up round by round. 

Clarification. I'm interested in the expected value of the winner's score divided by the number of rounds played, conditioned on somebody actually winning. I realize there's two possible interpretations of the problem:

  1. Winner was at $19$ points, then scored $3$ more in the final round, hence a final score of $22$.
  2. Winner was at $19$ points, then scored $3$ more in the final round, but their final score gets capped out at $20$.

I'm interested in seeing the solution for both interpretations of the problem. 

Best Answer

First I'll focus on the situation where the score is not capped at 20, i.e. your case #1. I'll say how to handle case 2 afterward.

Also, a warning: I'm presenting a way to compute the exact answer using a Python script, not a solution via simple closed-form equations. I kinda suspect that the second type of answer doesn't exist or at least is quite hard to find for this problem.

Setting up equations

Let's think about calculating the expected value of (player A's score) / (turns played) conditioned on A winning the game. To do this I'll use dynamic programming. That's basically a fancy programming way to say I'll write down a recurrence relation and then make my computer plug into the recurrence over and over to get the answer.

Let $T$ be the number of turns played in the game. Let $A$ be A's final score. Define a function $f(a,b,t,x)$ to be the probability of taking $t$ more turns to end up with A winning with final score $x$, starting with A having $a$ points and B having $b$ points. Now we can use Wikipedia's formula for expected value conditioned on an event to rewrite the expected value you asked for:

$$\begin{align} E\left[\frac{A}{T} \bigg| A \text{ wins}\right] &= \sum_{r} r \frac{P[\frac A T = r \text{ and } A \text{ wins}]}{P(A)} \\ &= \frac{\sum_{t=0}^{10} \sum_{x=20}^{24} \frac{x}{t} f(0,0,t,x)}{\sum_{t=0}^{10} \sum_{x=20}^{24} f(0,0,t,x)} \end{align}$$

That formula will let us compute the answers you want if we have a way to compute $f(0,0,t,x)$ for various values of $t$ and $x$. With that in mind, let's think about how to compute various values of $f$.

To compute $f(a,b,t,x)$ I'll use recursion, first computing the values with small values of $t$ and then working my way up. The base cases are:

  • For $a \ge 20$ and $b < 20$, $f(a,b,0,a) = 1$ because the game is already over and it worked out the way we wanted.
  • All remaining cases with $t=0$ have $f(a,b,t,x) = 0$ because the game is over but we didn't get the result we wanted.
  • All remaining cases with $a \ge 20$ and/or $b \ge 20$ have $f(a,b,t,x) = 0$ because someone already won but we didn't get A winning with the correct turn count + score.

Now let's build a recursive formula by thinking about what will happen on the next turn. We'll have $$\begin{align} f(a,b,t,x) &= \sum_{c=1}^5 f(a+c, b, t-1, x) P[A \text{ scores } c \text{ on the next turn}] \\ &\quad \quad + \sum_{c=1}^5 f(a, b+c, t-1, x) P[B \text{ scores } c \text{ on the next turn}] \\ &\quad \quad + f(a, b, t-1, x) P[\text{no points scored on the next turn}] \\ \end{align}$$

The final "pencil and paper" step is to compute the probabilities of each outcome for the next turn. Once we have that, the rules above are actually already enough to compute $f(a,b,x,t)$ for all the values we care about.

There are 36 possible combinations of die rolls on the next turn. There's 1 combination where A scores 5 points ((6,1)), 2 combinations where A scores 4 points ((6,2) or (5,1)), and so on; in general, the chance of A scoring $c$ points is $\frac{6-c}{36}$. The chance of B scoring $c$ points is the same. Finally, the chance of a tie is $\frac 1 6$.

Sample implementation in Python

Here is a sample Python implementation of the above formulas. In the interest of making the code readable, I tried to Python code. I hope it's readable; I just directly implemented the formulas I typed above. Remember that the key idea is to first set up the base cases with $t=0$, then use the recursive equation to solve the cases with $t=1$ by plugging in the already-computed $t=0$ answers, then solve the $t=2$ cases by using the $t=1$ answers, and so on. Anyway, feel free to ask questions about this if you have them.

from fractions import Fraction as frac
f = [[[[frac(0) for x in range(25)]
       for t in range(11)]
      for b in range(25)]
     for a in range(25)]

# base cases.
# note that every value we don't set here will be initialized as 0.
for a in range(20, 25):
    for b in range(20):
        f[a][b][0][a] = frac(1)

# now start applying the recurrence relation.
for t in range(1, 11):
    for a in range(20):
        for b in range(20):
            for x in range(20, 25):
                # First group of terms: cases where A scores points.
                for c in range(1, 6):
                    prob = frac(6-c, 36)
                    ff = f[a+c][b][t-1][x]
                    f[a][b][t][x] += prob * ff
                
                # Next group: B scores
                for c in range(1, 6):
                    prob = frac(6-c, 36)
                    ff = f[a][b+c][t-1][x]
                    f[a][b][t][x] += prob * ff
                
                # finally, count the ties.
                prob = frac(6, 36)
                ff = f[a][b][t-1][x]
                f[a][b][t][x] += prob * ff
                
# OK, done applying the recurrence. Now it's time to read off the answer.
final_numerator = frac(0)
final_denominator = frac(0)
for t in range(1, 11):
    for x in range(20, 25):
        final_numerator += frac(x, t) * f[0][0][t][x]
        final_denominator += f[0][0][t][x]

out = final_numerator / final_denominator

print(final_denominator, final_denominator * 1., sep='\t\t')  # probability that A wins
print(out, out * 1., sep='\t\t')  # E[A's score per round | A wins]

I get a final answer of $$\begin{align} E\left[\frac{A}{T} \bigg| A \text{ wins}\right] &= \boxed{\frac{6480877805871607}{2825235479368820}} \approx 2.2939. \end{align}$$ Just for fun, I also had the program print the probability that A wins, which turns out to be $$P[A \text{ wins}] = \frac{20180253424063}{914039610015744} \approx 2.21\%$$ which means that the probability of a draw is $\approx 100\% - 2 \cdot 2.21\% \approx 95.58\%$.

Checking my work

A program like this is error prone. It could be easy to make a mistake in a math formula, and then also easy to make a typo in the implementation. Fortunately there's an easy way to confirm that we're getting the right answer. The expected value is just the long-run average if we played the game infinitely many times... so we can just simulate the game a million times and compute that average result. This doesn't require much math or tricky programming. The downside is that this answer will only be correct to a couple of decimal places, but that should be plenty to confirm whether the exact result above is correct.

Here's the Python code I used to run $10^6$ simulations and average the results:

import random
NUM_TRIALS = 10**6
TARGET_SCORE = 20
wins = 0
results_sum = 0

for _ in range(NUM_TRIALS):
    # set up for a new game.
    score_a = 0
    score_b = 0
    for rd in range(10):  # play up to 10 rounds
        da = random.randrange(1, 7)  # random integer from 1 to 6
        db = random.randrange(1, 7)  # second die roll
        v = abs(da - db)
        if da > db: score_a += v
        else: score_b += v
        m = max(score_a, score_b)
        if m >= TARGET_SCORE:  # did someone win?
            wins += 1
            results_sum += m / (rd+1)
            break  # if someone won, don't play more turns in this game.

print(results_sum / wins)  # E[winner score per round | somebody wins]
print(wins / NUM_TRIALS)  # chance that somebody wins

When I ran this, I got 2.2933288034777175 for the expected value and 0.044134 for the probability that someone wins. Rerunning the code a few times, I can see that the 2.29 seems correct and the digits after that are different each time so I shouldn't trust them. The 2.29 matches the first decimal places of my exact answer above, so I conclude the full implementation is probably correct.

Version 2 of the problem

I can change my Python code to solve this version instead just by changing one x to a 20. Can you see which one I should change? Hint: with that change, the new expected value is $$\begin{align} E\left[\frac{A}{T} \bigg| A \text{ wins}\right] &= \boxed{\frac{311815273904684}{141261773968441}} \approx 2.20736 \end{align}$$ and the probability of A winning vs B winning vs draw is exactly the same as before.

We can do some quick sanity checks. It makes sense that the probability of A winning is unchanged, because the rule change only affects the winner's final score; it can never change who wins. Also, this rule change will have the effect of sometimes giving the winner a lower final score, so it makes sense that the expected value came out a tad lower this time.