[Math] What’s the optimal strategy of this dice game

dicedynamic programmingprobability

I'm working on a dynamic programming problem. I want to find the optimal strategy and simulate it.

The game's description:

The player rolls two dice.

  • If the numbers shown by the two dice are different then the player will add the sum given ($2+3$ for example) to the cumulative rewards he has.

  • If the numbers shown by the two dice are equal then the player will loose all his reward.

I started modeling the problem. The state of the system is as follow:
I chose one random number (300 for example) that may be the maximum reward for $N$ rounds game. The state is $V_k(S,D)$ where $S$ is the cumulative sum and $D$ is the sum of the two numbers shown on the two dice.

If we suppose that the numbers of rounds is finite, what will be the optimal strategy in a simple form?

In the case of infinite game what's the mean of the reward?

Reference :
http://people.brandeis.edu/~igusa/Math56aS08/Math56a_S08_notes041.pdf

Thank you in advance guys 🙂

Best Answer

When to stop depends on how many rounds you're allowed to play. If you're allowed to play for arbitrary many rounds, you could decide not to stop until you have a million points. Then with probability $1$ sooner or later (most likely much later) you will leave the table with a million points.

So the game is only interesting when there's a time limit after which you must stop playing and leave with whatever you've got at that time if you haven't cashed out before that.

A strategy is a function $f(N,S)$ that gives our decision to either cash out or keep playing as the output of two inputs: $N$ is the number of rounds left, and $S$ is how many points you have when you make the decision. The fact that you're supposed to use dynamic programming is a hint that you're not supposed to find an explicit formula for $f(N,S)$, but rather to compute a table of values for $f(N,S)$.

It is easy enough to analyze what the optimal choice for $N=1$ is. If you quit now your expected winnings are $S$; if you play the last round your expected winnings are $\frac16 0 + \frac56(S+7)$, (since a roll that doesn't bankrupt you wins you 7 points on average). So you should play the last round iff $\frac {5S}6 + \frac{35}6 \ge S$ which is if $S\le 35$.

In order for you to proceed to $N=2$ you need to compute an auxiliary table of the expected value of the game before you make the $f(1,S)$ for various $S$. For large enough $S$ you should always cash out, in which case the expected value is $S$ itself -- so you only need to tabulate the expected values for $S$ less than the cash-out threshold.

Now you can compute $f(2,S)$ as follows: The expected value of walking out is $S$. To find the expected value of playing, consider the various outcomes of the dice roll: you could go bankrupt or win 3, 4, 5, ..., 10, 11 points, each with a particular probablity. Each of these cases take you to a state $(1,S')$ that you have already computed the expected value of; the appropriately weighted average of those expectations is the value of going on. Compare this with $S$ to figure out if you should leave or go on.

Now do the same thing for $f(3,-)$, $f(4,-)$ and so forth, until you reach the actual length of your game.

(Note that the actual strategic decisions can be made in constant time on the fly if only you have tabulated the values of the game at each possible state).

Related Question