[Math] Probabilistic dynamic programming question

discrete optimizationdynamic programmingoperations research

A gambler has 2 dollars. He is allowed to play a game four times and
his goal is to maximize his probability of ending with at least 6
dollars . If the gambler bets $b$ dollars then with probability $0.4$
he wins the game and increases his capital position by $b$ dollars;
with probability $0.6$ he loses the game and decreases his capital by
$b$ dollars. On any play of game gambler may not bet more than he has
available. Determine the betting strategy that would maximize the
gamblers probability of attaining at least $6$ dollars by the end of
fourth game.Assume that bets of zero dollars(not betting) are
permissible.

By using probabilistic dynamic programming
solve this.

I defined as let $f_t(d)$ be the probability by the end of game 4, the gambler will have at least $6$ dollars. $d$ is the amount that he has at the beginning of game $t$ which he can use for betting and let $b_t(d) $ be the amount that he bets to attain $f_t(d)$.

But I don't understand how I can make a recursive relationship between $f_t(d)$ and $f_t+1$ and solve this problem . As a hint it is given

max{$0.4f_{t+1}(d+b)+0.6f_{t+1}(d−b)$}
Can someone please show me a way to do this problem using the hint.

Best Answer

It seems more like backward induction than dynamic programming to me.

In the last game, the gambler will bet $0$ dollars if he has at least $6$, winning with probability $1$, will bet $6-d$ if he has $3\le d\lt 6$, winning with probability $0.4$, and will give up and win with probability $0$ if he has less than $3$. So we know the value of the game before the last bet:

$$ f_4(d)=\begin{cases} 1&d\ge6\;,\\ 0.4&3\le d\lt 6\;,\\ 0&d\lt 3\;. \end{cases} $$

Now we can analyze the third bet. Again, for $d\ge6$ we have $f_3(d)=1$. For $4.5\le d\lt 6$ the gambler will bet $6-x$, winning with probability $0.4$ and getting another $0.4$ chance with probability $0.6$, for a total of $0.4+0.6\cdot 0.4=0.64$. For $3\le d\lt 4.5$ we can try to get to $6$ in one go by immediately betting $6-d$, which gives us a $0.4$ probability of winning, and we can't do any better by betting less since we'd then still have to win in the last round, so the value in this range is $0.4$. For $1.5\le d\lt3$ we try to double twice, for a chance of $0.4^2=0.16$, and for $d\lt1.5$ we give up. So to summarize,

$$ f_3(d)=\begin{cases} 1&d\ge6\;,\\ 0.64&4.5\le d\lt 6\;,\\ 0.4&3\le d\lt 4.5\;,\\ 0.16&1.5\le d\lt 3\;,\\ 0&d\lt 1.5\;. \end{cases} $$

You can go on like this to deduce the rest of the strategy. There will be some slightly less trivial decisions, but you can always make them by comparing the expected values, using the known game values of the next stage.

Edit in response to the comment: Your definition of $f_t(d)$ isn't very clearly formulated ($t$ doesn't occur in the sentence introducing $f_t(d)$), but since in the next sentence you write "$d$ is the amount that he has at the beginning of game $t$", I figured that $f_t(d)$ is meant to be the probability that the gambler will win (i.e. will have at least $6$ dollars after the fourth game) given that he has $d$ dollars before game $t$. Now, the way I understand your hint is that you can calculate $f_t(d)$ from $f_{t+1}$ as

$$ f_t(d)=\max_{0\le b\le d}\left(0.4f_{t+1}(d+b)+0.6f_{t+1}(d−b)\right)\;, $$

that is, knowing the probabilities for stage $t+1$ you can perform backward induction for stage $t$ by maximizing the winning probability over all possible bets. Now the way you've defined $f_t(d)$, it's literally only defined for $t=1,\ldots,4$, because there are $4$ games. But "before game $t$" was just some language you used to refer to a point in time, and you can extend that to let $t=5$ refer to the time after the fourth game. At that time, the probabilities are particularly simple, since at that time the outcome is decided: For $d\ge6$ you win (with probability $1$), and for $d\lt6$ you lose (i.e. you win with probability $0$). So the initial values for the recursion can be taken to be

$$ f_5(d)=\begin{cases}1&d\ge6\;,\\0&d\lt6\;.\end{cases} $$

Best Answer

Related Solutions

[Math] Dynamic Programming Investment Question

Related Question