[Math] Dynamic Programming Investment Question

dynamic programming

I'm working on a dynamic programming problem from a textbook. My solution is different from that given in the solution manual, and I'm looking for input as to which answer is correct.

The problem is to choose between two investments ($A$ and $B$) at the beginning of each year, over a three year period. You are "given" \$5000 to invest. You can make only one investment each year, and can only invest \$5000 each time (any extra money accumulated is left "idle"). The objective is to find an investment policy that maximizes the expected amount of money you will have after three years. The investment returns are in the table below.

Investment return probabilities

My approach was to start in the last period ($n=3$). In this period, I will either: (1) have at least \$5000 so that I can make an investment, or (2) have no remaining capital and not be able to make an investment. In the second scenario I obviously don't have any decision to make. In the first scenario, the optimal choice would be $A$, as
$$E(A)=0.3 \cdot 0 + 0.7 \cdot 10000 = 7000 > 5500 = 0.9 \cdot 5000 + 0.1 \cdot 10000 = E(B)$$
where $E(\cdot)$ is expected value. Now, the solution manual (which I found online, and contains lots of hand written answers) claims that this conclusion – that $E(A) > E(B)$ – means I should choose $A$ in every period. I don't think this is correct.

It ignores the expected returns from future periods. Investment $A$ may have higher expected returns in a single given period, but it also has the risk of zero return and hence the risk of not being able to invest in future periods.

In the second period, for example, suppose you have \$5000 to invest. The expected return of $A$ is:
$$E(A) = 7000 + 0.7 \cdot 7000 = 11900,$$
where the second term captures the expected return of being able to invest in $A$ (the optimal choice) in period 3. The expected return of $B$, however, is
$$E(B) = 5500 + 7000 = 12500,$$
because there will be enough money to invest in $A$ in period 3 with a probability of 1 ($B$ returns at least \$5000).

Am I correct? Any input appreciated and thanks.

Best Answer

You concern makes sense.Moreover I am sure that in general you are correct and and optimal on 1 period strategy does not have to be optimal on all period. But for this specific example it seems for me it is indeed true.

It seems for me that your calculations of the return for the second period are wrong. At least I don't understand how do you get these numbers. So let's calculate from the beginning. Let $E_n(C,X_{n,n+1...})$ be an expected return of your invested return since period $n$ to the end (n=3) with given capital $C$ and the strategy $X$.$X$ is the sequence of strategies. Let $E_n^1(C,X)$ be 1 period expected return. Let $B_n(X)$ be maximum of such return.Let $E_n^1(C,X)$ be 1 period expected return. (In both return we include the rest of capital if $C>5000$). Also let $R^1_n(C,X)$ be realized 1 period return. Then Obviously $$ B_n(C)=\max_{X} E_n^1(C,X)+E(B_{n+1}(R_n^1(C,X))) $$ Let's start from $B_3$ (you almost finihes this part) So $$ B_3(C)=\max((C-5000,0)+\min(C,5000)*7/5; $$ Assume now that $C\ge 5000$ and $D=C-5000$. Then $B_3(C)=D+7000$ Assume now that $C<5000$. Then $B_3(C)=C*7/5$; However $C$ is random on the 3d period. It depends on the previous capital and the chosen strategy at the previos period. Assume that at the begining of the previous period we have capital $C_2$. Then $$ R^1_2(C_2,A)=(\max(C_2-5000,0)+2\min(C_2,5000) \text { with } P=0.7\\ R^1_2(C_2,A)=\max(C_2-5000,0) \text { with }P=0.3 \\ R^1_2(C_2,B)=(\max(C_2-5000,0)+2\min(C_2,5000) \text { with } P=0.1\\ R^1_2(C_2,B)=\max(C_2-5000,0) +\min(C_2,5000)=C_2\text { with }P=0.9 \ $$ Thus $$ E (B_3(R^1_2(C_2,A)))=0.7\cdot B_3((\max(C_2-5000,0)+2\cdot\min(C_2,5000))+0.3B_3(\max(C_2-5000,0))\\ E (B_3(R^1_2(C_2,B)))=0.1\cdot B_3((\max(C_2-5000,0)+2\cdot\min(C_2,5000))+0.9(B_3(C_2)) $$ The difference $$ E (B_3(R^1_2(C_2,A)))-E (B_3(R^1_2(C_2,B))) $$ It is not so simple calculation but checking different intervals for $C_2$: $[0,2500],[2500,5000],[5000,10000],[10000,\infty)$ one can prove that it always positive. Thus and since $B_3(C)$ is strictly monotonic in $C$ then to get $B_2$ we have to maximize $E_n^1(C,X)$ ONLY. Thus $$ B_2(C)=(C-5000,0)+\min(C,5000)*7/5+0.7 B_3(\max((C-5000,0)+2\min(C,5000))+0.3 B_3(\max((C-5000,0) $$ Notice now that since $C_1=5000$ we can easily can calculate $B_2$ for all 3 possible outcomes from the period $1$ - $0,5000,1000$. $$ B_2(0)=0; B_2(5000)=7000+0.7(5000+7000)=15400; B_2(10000)=5000+7000+0.7(5000+10000)+0.3 7000=24600; $$ Thus $$ 0.7 B_2(10000)+0.3 B_2(0)=0.7\cdot 24600=17220\\ 0.1 B_2(10000)+0.9 B_2(5000)=0.1\cdot 24600+0.9\cdot 15400=16320 $$ So again $A$ is optimal on the first period too. And finally $$ B_1(5000)=7000+0.7 B_2(10000)+0.3 B_2(0)=24220; $$ REMARK In general, however for me is absolutely not obvious that the optimal strategy on 1 period should be optimal on all period. Moreover, I am sure I can find contr examples when it is better to switch the strategies.

Related Solutions

[Math] Probabilistic dynamic programming question

It seems more like backward induction than dynamic programming to me.

In the last game, the gambler will bet $0$ dollars if he has at least $6$, winning with probability $1$, will bet $6-d$ if he has $3\le d\lt 6$, winning with probability $0.4$, and will give up and win with probability $0$ if he has less than $3$. So we know the value of the game before the last bet:

$$ f_4(d)=\begin{cases} 1&d\ge6\;,\\ 0.4&3\le d\lt 6\;,\\ 0&d\lt 3\;. \end{cases} $$

Now we can analyze the third bet. Again, for $d\ge6$ we have $f_3(d)=1$. For $4.5\le d\lt 6$ the gambler will bet $6-x$, winning with probability $0.4$ and getting another $0.4$ chance with probability $0.6$, for a total of $0.4+0.6\cdot 0.4=0.64$. For $3\le d\lt 4.5$ we can try to get to $6$ in one go by immediately betting $6-d$, which gives us a $0.4$ probability of winning, and we can't do any better by betting less since we'd then still have to win in the last round, so the value in this range is $0.4$. For $1.5\le d\lt3$ we try to double twice, for a chance of $0.4^2=0.16$, and for $d\lt1.5$ we give up. So to summarize,

$$ f_3(d)=\begin{cases} 1&d\ge6\;,\\ 0.64&4.5\le d\lt 6\;,\\ 0.4&3\le d\lt 4.5\;,\\ 0.16&1.5\le d\lt 3\;,\\ 0&d\lt 1.5\;. \end{cases} $$

You can go on like this to deduce the rest of the strategy. There will be some slightly less trivial decisions, but you can always make them by comparing the expected values, using the known game values of the next stage.

Edit in response to the comment: Your definition of $f_t(d)$ isn't very clearly formulated ($t$ doesn't occur in the sentence introducing $f_t(d)$), but since in the next sentence you write "$d$ is the amount that he has at the beginning of game $t$", I figured that $f_t(d)$ is meant to be the probability that the gambler will win (i.e. will have at least $6$ dollars after the fourth game) given that he has $d$ dollars before game $t$. Now, the way I understand your hint is that you can calculate $f_t(d)$ from $f_{t+1}$ as

$$ f_t(d)=\max_{0\le b\le d}\left(0.4f_{t+1}(d+b)+0.6f_{t+1}(d−b)\right)\;, $$

that is, knowing the probabilities for stage $t+1$ you can perform backward induction for stage $t$ by maximizing the winning probability over all possible bets. Now the way you've defined $f_t(d)$, it's literally only defined for $t=1,\ldots,4$, because there are $4$ games. But "before game $t$" was just some language you used to refer to a point in time, and you can extend that to let $t=5$ refer to the time after the fourth game. At that time, the probabilities are particularly simple, since at that time the outcome is decided: For $d\ge6$ you win (with probability $1$), and for $d\lt6$ you lose (i.e. you win with probability $0$). So the initial values for the recursion can be taken to be

$$ f_5(d)=\begin{cases}1&d\ge6\;,\\0&d\lt6\;.\end{cases} $$

[Math] Dynamic programming– chocolate bar

The number of steps in any process is always $mn-1$ since the number of pieces increases by $1$ after each cut, and you want to reach a state with $mn$ pieces.

Here is a possible way to solve the problem with dynamic programming. Set up a bidimensional array M[][] such that M[x][y] is the minimum number of steps to squarisize the grid. It is then clear tht M[x][y] is equal to $\min(\min( M[k][y]+M[x-k][y]+1),\min(M[x][k]+M[x][y-k]+1))$, where $k$ ranges over all apropriate values.

Using this we can calculate M[][] recursively as is shown in thefollowing code:

#include <bits/stdc++.h>
using namespace std;

const int MAX=100;
const int INF=1e9;
int M[MAX][MAX];

void actualize(int x, int y){
    int minimum=INF;
    for(int k=1;k<x;k++){
        minimum=min(minimum,M[k][y]+M[x-k][y]+1);
    }
    for(int k=1;k<y;k++){
        minimum=min(minimum,M[x][k]+M[x][y-k]+1);
    }
    M[x][y]=minimum;
}

int main(){
    for(int i=1;i<MAX;i++){
        for(int j=1;j<MAX;j++){
            if(i==1 && j==1) continue;
            actualize(i,j);
        }
    }
    int m,n;
    cin >> m >> n;
    cout << M[m][n] << endl;
}

You can optimize this code by making an auxiliary table calculating partial minimums, but of course this is irrelevant, since the actual optimal solution is to just print $mn-1$