[Math] Simple dice game: Optimal strategy

game theoryprobability

Here's the description of a dice game which puzzles me since quite some time (the game comes from a book which offered a quite unsatisfactory solution — but then, its focus was on programming, so this is probably excusable).

The game goes as follows:

Two players play against each other, starting with score 0 each. Winner is the player to first reach a score of 100 or more. The players play in turn. The score added in each round is determined as follows: The player throws a die. If the die does not show an 1, he has the option to stop and have the points added to his score, or to continue throwing until either he stops or gets an 1. As soon as he gets an 1, his turn ends and no points are added to his score, any points he has accumulated in this round are lost. Afterward it is the second player's turn.

The question is now what is the best strategy for that game. The book suggested to try out which of the following two strategies gives better result:

  • Throw 5 times (if possible), then stop.
  • If the accumulated points in this round add up to 20 or more, stop, otherwise continue.

The rationale is that you want the next throw to increase the expected score. Of course it doesn't need testing to see that the second strategy is better: If you've accumulated e.g. 10 points, it doesn't matter whether you accumulated them with 5 times throwing a 2, or with 2 times throwing a 5.

However it is also easy to see that this second strategy isn't the best one either: After all, the ultimate goal is not to maximize the increase per round, but to maximize the probability to win.; both are related, but not the same. For example, imagine you have been very unlucky and are still at a very low score, but your opponent has already 99 points. It's your turn, and you've already accumulated some points (but those points don't get you above 100) and have to decide whether to stop, or to continue. If you stop, you secure the points, but your opponent has a 5/6 chance to win in the next move. Let's say that if you stop, the optimal strategy in the next move will be to try to get 100 points in one run, and that the probability to reach that is $p$. Then if you stop, since your opponent then has his chance to win first, your total probability to win is just $1/6(p + (1-p)/6 (p + (1-p)/6 (p + …))) = p/(p+5)$. On the other hand, if you continue to 100 points right now, you have the chance $p$ to win this round before the other has a chance to try, but a lower probability $p'$ to win in later rounds, giving a probability $p + p'/(5+p')$. It is obvious that even if we had $p'=0$ (i.e. if you don't succeed now, you'll lose), you'd still have the probability $p>p/(p+5)$ to win by continuing, so you should continue no matter how slim your chances, and even if your accumulated points this round are above 20, because if you stop, you chances will be worse for sure. Since at some time, the optimal strategy will have a step where you try to go beyond 100 (because that's where you win), by induction you can say that if your opponent has already 99 points, your best strategy is, unconditionally, to try to get 100 points in one run.

Of course this "brute force rule" is for that specific situation (it also applies if the opponent has 98 points, for obvious reasons). If you'd play that brute-force rule from the beginning, you'd lose even against someone who just throws once each round. Indeed, if both are about equal, and far enough from the final 100 points, intuitively I think the 20 points rule is quite good. Also, intuitively I think if you are far advanced against your opponent, you should even play more safe and stop earlier.

As the current game situation is described by the three numbers your score ($Y$), your opponent's score ($O$) and the points already collected in this round ($P$), and your decision is to either continue ($C$) or to stop ($S$), a strategy is completely given by a function
$$s:\{(Y, O, P)\}\to \{C,S\}$$
where the following rules are obvious:

  • If $Y+S\ge 100$ then $s(Y,O,P)=S$ (if you already have collected 100 points, the only reasonable move is to stop).
  • $s(Y, O, 0)=C$ (it doesn't make sense to stop before you threw at least once).

Also, I just above derived the following rule:

  • $f(Y,98,P)=f(Y,99,P)=C$ unless the first rule kicks in.

I believe the following rule should also hold (but have no idea how to prove it):

  • If $f(Y,98,P)=S$ then also $f(Y,98,P+1)=S$

If that believe is true, then the description of a strategy can be simplified to a function $g(Y,O)$ which gives the smallest $P$ at which you should stop.

However, that's all I've figured out. What I'd really like to know is: What is the optimal strategy for this game?

Best Answer

There is no simplified description of the Nash equilibrium of this game.

You can compute the best strategy starting from positions where both players are about to win and going backwards from there. Let $p(Y,O,P)$ the probability that you win if you are at the situation $(Y,O,P)$ and if you make the best choices. The difficulty is that to compute the strategy and probability to win at some situation $(Y,O,P)$, you make your choice depending on the probability $p(O,Y,0)$. So you have a (piecewise affine and contracting) decreasing function $F_{(Y,O,P)}$ such that $p(Y,O,P) = F_{(Y,O,P)}(p(O,Y,0))$, and in particular, you need to find the fixpoint of the composition $F_{(Y,O,0)} \circ F_{(O,Y,0)}$ in order to find the real $p(O,Y,0)$, and deduce everything from there.

After computing this for a 100 points game and some inspecting, there is no function $g(Y,O)$ such that the strategy simplifies to "stop if you accumulated $g(Y,O)$ points or more". For example, at $Y=61,O=62$,you should stop when you have exactly $20$ or $21$ points, and continue otherwise.

If you let $g(Y,O)$ be the smallest number of points $P$ such that you should stop at $(Y,O,P)$, then $g$ does not look very nice at all. It is not monotonous and does strange things, except in the region where you should just keep playing until you lose or win in $1$ move.

Related Question