When to stop depends on how many rounds you're allowed to play. If you're allowed to play for arbitrary many rounds, you could decide not to stop until you have a million points. Then with probability $1$ sooner or later (most likely much later) you will leave the table with a million points.
So the game is only interesting when there's a time limit after which you must stop playing and leave with whatever you've got at that time if you haven't cashed out before that.
A strategy is a function $f(N,S)$ that gives our decision to either cash out or keep playing as the output of two inputs: $N$ is the number of rounds left, and $S$ is how many points you have when you make the decision. The fact that you're supposed to use dynamic programming is a hint that you're not supposed to find an explicit formula for $f(N,S)$, but rather to compute a table of values for $f(N,S)$.
It is easy enough to analyze what the optimal choice for $N=1$ is. If you quit now your expected winnings are $S$; if you play the last round your expected winnings are $\frac16 0 + \frac56(S+7)$, (since a roll that doesn't bankrupt you wins you 7 points on average). So you should play the last round iff $\frac {5S}6 + \frac{35}6 \ge S$ which is if $S\le 35$.
In order for you to proceed to $N=2$ you need to compute an auxiliary table of the expected value of the game before you make the $f(1,S)$ for various $S$. For large enough $S$ you should always cash out, in which case the expected value is $S$ itself -- so you only need to tabulate the expected values for $S$ less than the cash-out threshold.
Now you can compute $f(2,S)$ as follows: The expected value of walking out is $S$. To find the expected value of playing, consider the various outcomes of the dice roll: you could go bankrupt or win 3, 4, 5, ..., 10, 11 points, each with a particular probablity. Each of these cases take you to a state $(1,S')$ that you have already computed the expected value of; the appropriately weighted average of those expectations is the value of going on. Compare this with $S$ to figure out if you should leave or go on.
Now do the same thing for $f(3,-)$, $f(4,-)$ and so forth, until you reach the actual length of your game.
(Note that the actual strategic decisions can be made in constant time on the fly if only you have tabulated the values of the game at each possible state).
I think there is not a general solution. Here is my solution on 4 players, rolling 4-sided dice case. Calculation is pretty complicated. I don't know if there is any better method.
First of all, we need to figure out results of each round of rolling. 4 players roll 4 dice, then we can get:
- 0 winner: $p_0=\frac{10}{64}$
- AAAA: $p_01=4*\frac{1}{4^4}=\frac{1}{64}$
- AABB: $p_02={4\choose2}\frac{\frac{4*3}{2}}{4^4}=\frac{9}{64}$
- 1 winner: $p_1=\frac{12}{64}$
- AAAB: $p={4\choose1}\frac{4*3}{4^4}=\frac{12}{64}$
- 2 winners: $p_2=\frac{36}{64}$
- AABC: $p={4\choose2}\frac{4*3*2}{4^4}=\frac{36}{64}$
- 3 winners: $p_3=0$
- 4 winners: $p_4=\frac{6}{64}$
- ABCD: $p=\frac{4*3*2*1}{4^4}=\frac{6}{64}$
$E(n)$ is the expected number of rolls to finish a game where $n$ players have not yet won at least once. So we want to know $E(4)$.
$$E(4)=1+p_0*E(4)+p_1*E(3)+p_2*E(2)$$
$$E(3)=1+p_0*E(3)+\frac{1}{4}p_1*E(3)+\frac{3}{4}p_1*E(2)+\frac{1}{2}p_2*E(2)+\frac{1}{2}p_2*E(1)$$
$$E(2)=1+p_0*E(2)+\frac{2}{4}p_1*E(2)+\frac{2}{4}p_1*E(1)+\frac{1}{6}p_2*E(2)+\frac{4}{6}p_2*E(1)$$
$$E(1)=1+p_0*E(1)+\frac{3}{4}p_1*E(1)+\frac{3}{6}p_2*E(1)$$
And I got $E(4)=6.401612$ which is not close to the simulation result. If you find out what's wrong with it please comment. Thanks a lot!
Best Answer
Here's a Sage session that calculates the expected value and the optimal strategy:
Thus the game's value is about $25.4$, and the reward at which to stop depends strongly on the probability of repeating the last roll, and is also slightly higher for the lower of two equiprobable sums (e.g. $2$ and $12$), as might be expected.