Let's play a dice game where each of $2$ players rolls one 6-sided dice each turn. The player who rolls the highest each turn gets paid what they rolled subtracted by what the other player rolled e.g. if Player $1$ rolls a $6$ and Player $2$ rolls a $2$, then Player $1$ earns $6 – 2 = 4$ points. No points for anyone in the case of a tie. The game ends after a player hits $20$ points (who wins the game), or $10$ rounds have passed. If nobody has hit $20$ points after $10$ rounds, then the game simply ends with no winner.
Question. How many points per round on average does the winning player earn?
My thoughts. Instinctively, the answer should be $20$ divided by the number of rounds played. But that ignores that we could end on $20$, $21$, $22$, $23$, $24$. So we may have to work backwards from those $6$ results and tally up round by round.
Clarification. I'm interested in the expected value of the winner's score divided by the number of rounds played, conditioned on somebody actually winning. I realize there's two possible interpretations of the problem:
- Winner was at $19$ points, then scored $3$ more in the final round, hence a final score of $22$.
- Winner was at $19$ points, then scored $3$ more in the final round, but their final score gets capped out at $20$.
I'm interested in seeing the solution for both interpretations of the problem.
Best Answer
First I'll focus on the situation where the score is not capped at 20, i.e. your case #1. I'll say how to handle case 2 afterward.
Also, a warning: I'm presenting a way to compute the exact answer using a Python script, not a solution via simple closed-form equations. I kinda suspect that the second type of answer doesn't exist or at least is quite hard to find for this problem.
Setting up equations
Let's think about calculating the expected value of (player A's score) / (turns played) conditioned on A winning the game. To do this I'll use dynamic programming. That's basically a fancy programming way to say I'll write down a recurrence relation and then make my computer plug into the recurrence over and over to get the answer.
Let $T$ be the number of turns played in the game. Let $A$ be A's final score. Define a function $f(a,b,t,x)$ to be the probability of taking $t$ more turns to end up with A winning with final score $x$, starting with A having $a$ points and B having $b$ points. Now we can use Wikipedia's formula for expected value conditioned on an event to rewrite the expected value you asked for:
$$\begin{align} E\left[\frac{A}{T} \bigg| A \text{ wins}\right] &= \sum_{r} r \frac{P[\frac A T = r \text{ and } A \text{ wins}]}{P(A)} \\ &= \frac{\sum_{t=0}^{10} \sum_{x=20}^{24} \frac{x}{t} f(0,0,t,x)}{\sum_{t=0}^{10} \sum_{x=20}^{24} f(0,0,t,x)} \end{align}$$
That formula will let us compute the answers you want if we have a way to compute $f(0,0,t,x)$ for various values of $t$ and $x$. With that in mind, let's think about how to compute various values of $f$.
To compute $f(a,b,t,x)$ I'll use recursion, first computing the values with small values of $t$ and then working my way up. The base cases are:
Now let's build a recursive formula by thinking about what will happen on the next turn. We'll have $$\begin{align} f(a,b,t,x) &= \sum_{c=1}^5 f(a+c, b, t-1, x) P[A \text{ scores } c \text{ on the next turn}] \\ &\quad \quad + \sum_{c=1}^5 f(a, b+c, t-1, x) P[B \text{ scores } c \text{ on the next turn}] \\ &\quad \quad + f(a, b, t-1, x) P[\text{no points scored on the next turn}] \\ \end{align}$$
The final "pencil and paper" step is to compute the probabilities of each outcome for the next turn. Once we have that, the rules above are actually already enough to compute $f(a,b,x,t)$ for all the values we care about.
There are 36 possible combinations of die rolls on the next turn. There's 1 combination where A scores 5 points ((6,1)), 2 combinations where A scores 4 points ((6,2) or (5,1)), and so on; in general, the chance of A scoring $c$ points is $\frac{6-c}{36}$. The chance of B scoring $c$ points is the same. Finally, the chance of a tie is $\frac 1 6$.
Sample implementation in Python
Here is a sample Python implementation of the above formulas. In the interest of making the code readable, I tried to Python code. I hope it's readable; I just directly implemented the formulas I typed above. Remember that the key idea is to first set up the base cases with $t=0$, then use the recursive equation to solve the cases with $t=1$ by plugging in the already-computed $t=0$ answers, then solve the $t=2$ cases by using the $t=1$ answers, and so on. Anyway, feel free to ask questions about this if you have them.
I get a final answer of $$\begin{align} E\left[\frac{A}{T} \bigg| A \text{ wins}\right] &= \boxed{\frac{6480877805871607}{2825235479368820}} \approx 2.2939. \end{align}$$ Just for fun, I also had the program print the probability that A wins, which turns out to be $$P[A \text{ wins}] = \frac{20180253424063}{914039610015744} \approx 2.21\%$$ which means that the probability of a draw is $\approx 100\% - 2 \cdot 2.21\% \approx 95.58\%$.
Checking my work
A program like this is error prone. It could be easy to make a mistake in a math formula, and then also easy to make a typo in the implementation. Fortunately there's an easy way to confirm that we're getting the right answer. The expected value is just the long-run average if we played the game infinitely many times... so we can just simulate the game a million times and compute that average result. This doesn't require much math or tricky programming. The downside is that this answer will only be correct to a couple of decimal places, but that should be plenty to confirm whether the exact result above is correct.
Here's the Python code I used to run $10^6$ simulations and average the results:
When I ran this, I got 2.2933288034777175 for the expected value and 0.044134 for the probability that someone wins. Rerunning the code a few times, I can see that the 2.29 seems correct and the digits after that are different each time so I shouldn't trust them. The 2.29 matches the first decimal places of my exact answer above, so I conclude the full implementation is probably correct.
Version 2 of the problem
I can change my Python code to solve this version instead just by changing one
x
to a20
. Can you see which one I should change? Hint: with that change, the new expected value is $$\begin{align} E\left[\frac{A}{T} \bigg| A \text{ wins}\right] &= \boxed{\frac{311815273904684}{141261773968441}} \approx 2.20736 \end{align}$$ and the probability of A winning vs B winning vs draw is exactly the same as before.We can do some quick sanity checks. It makes sense that the probability of A winning is unchanged, because the rule change only affects the winner's final score; it can never change who wins. Also, this rule change will have the effect of sometimes giving the winner a lower final score, so it makes sense that the expected value came out a tad lower this time.