Dice roll games – when is greedy/considering short-term gain optimal

conditional probabilitydicegamblinginfinite-gamesprobability

In this question, I refer to two separate games. The first is a game where you roll and accumulate your score until a six appears:

You play a game using a standard six-sided die. You start with 0
points. Before every roll, you decide whether you want to continue the
game or end it and keep your points. After each roll, if you rolled 6,
then you lose everything and the game ends. Otherwise, add the score
from the die to your total points and continue/stop the game. When
should one stop playing this game?

The other is a game which I asked before, where you roll and accumulate your score until a repeat appears:

I keep rolling a die, and my score is the sum of all my rolls.
However, if I roll a value I had rolled before, I lose all. What is
the optimal strategy?

From what I gather in the first link, the greedy approach in the accepted answer is not technically correct since the calculated gain does not account for future rolls, although for some reason it is optimal. In the second link (the question I asked), I did not account for future rolls which hindered my analysis of when to stop (I am yet to figure out what the answerer means by "beating" the minimum sum of $1+2+3=6$. I am quite confused by these concepts, and therefore have two questions:

1) Sadly this is a repeat of one part the question I asked before — what exactly is meant by long-term gain of future rolls and how do you calculate it? Is it through some recursive probability calculation that accounts for our decision after each roll? Do we let the number of rolls tend to infinity?

2) More importantly, when is a greedy approach optimal, and how do we prove it? For the first link, I don’t really understand it (despite the thorough debate about it).

Thanks!

Best Answer

You don't need to account for future rolls, because of monotonicity of the games. As the game progresses, the gains don't increase and risks go higher. Also, in all states there exists at least non-decreasing strategy (to stop the game). That is why you can draw a graph of all states in the game and assign a number of immediate gain to each node and be sure that a node with a negative immediate gain cannot lead to node with positive immediate gain. So your strategy is just to end the game when the immediate gain is negative (because you can't lose a little at first and then gain more in the future rolls).

First game

The game has only one variable as a state — the number of points $p$. So the strategy $S$ can depend only on $p$. Next thing, if there is strategy $S_X$ that tells you to keep playing when you have $X$ points $S_X(X)=1$, then the strategy $$\tilde S_X(p) = \begin{cases}1 & p\le X \\ S_X(p) & p>X\end{cases}$$ will give you at least the same number of points in average. If $S_Y(Y)=0$, then the same true for: $$ \tilde S_Y(p) = \begin{cases}S_Y(p) & p< Y \\ 0 & p\ge Y\end{cases} $$

That means that the optimal strategy is just a critical number $P$, so $S(p)=1$ if $p\le P$ and $S(p)=0$ if $p>P$. The question is how to find $P$. Let's compare two strategies with $P_1=P$ and $P_2=P+1$. They tell you to do the same thing except for the moment when you have $p=P+1$ points. The first strategy will tell you to stop, the second one tells you to roll dice one more time and then stop (because you will gain at least 1 point, and $p+1=P+2>P+1$).

The expectation of the first strategy is $E_1=P+1$ points. Expectation of the second strategy is $E_2=5(P+1)/6 + (1+2+3+4+5)/6=5P/6+20/6$. So if $P<15$, then $E_1<E_2$. If $P>15$, then $E_1>E_2$. And if $P=15$, then $E_1=E_2$. Thus, there are two optimal pure strategies with $P=15$ and $P=16$ and everything else is worse.

Second game It's the other story because now the state depends on what numbers $S_k=\{i_1,\ldots i_k\}$ were rolled before. (But we don't need to add the dependence on $p$ explicilty, since $p=\sum S_k = i_1+\ldots+i_k$

First thought: if strategy tells you to stop, when set $S_k$ was rolled, then you should also stop with set $\tilde S_k: \sum\tilde S_k > \sum S_k$. It's because you risk more $\frac k6\tilde p > \frac k6 p$, but gain less $\frac{6-k}6(21-\tilde p) < \frac{6-k}6(21-p)$.

That is why we need to check the “minimum sum” for each $k$ to see at what $k$ you should totally stop. Analysis of $S_3=\{1,2,3\}$ shows that you should always stop at $k=3$. And then you analysis of 2 throws is perfectly valid.