[Math] Dynamic Programming Investment Question

dynamic programming

I'm working on a dynamic programming problem from a textbook. My solution is different from that given in the solution manual, and I'm looking for input as to which answer is correct.

The problem is to choose between two investments ($A$ and $B$) at the beginning of each year, over a three year period. You are "given" \$5000 to invest. You can make only one investment each year, and can only invest \$5000 each time (any extra money accumulated is left "idle"). The objective is to find an investment policy that maximizes the expected amount of money you will have after three years. The investment returns are in the table below.

Investment return probabilities

My approach was to start in the last period ($n=3$). In this period, I will either: (1) have at least \$5000 so that I can make an investment, or (2) have no remaining capital and not be able to make an investment. In the second scenario I obviously don't have any decision to make. In the first scenario, the optimal choice would be $A$, as
$$E(A)=0.3 \cdot 0 + 0.7 \cdot 10000 = 7000 > 5500 = 0.9 \cdot 5000 + 0.1 \cdot 10000 = E(B)$$
where $E(\cdot)$ is expected value. Now, the solution manual (which I found online, and contains lots of hand written answers) claims that this conclusion – that $E(A) > E(B)$ – means I should choose $A$ in every period. I don't think this is correct.

It ignores the expected returns from future periods. Investment $A$ may have higher expected returns in a single given period, but it also has the risk of zero return and hence the risk of not being able to invest in future periods.

In the second period, for example, suppose you have \$5000 to invest. The expected return of $A$ is:
$$E(A) = 7000 + 0.7 \cdot 7000 = 11900,$$
where the second term captures the expected return of being able to invest in $A$ (the optimal choice) in period 3. The expected return of $B$, however, is
$$E(B) = 5500 + 7000 = 12500,$$
because there will be enough money to invest in $A$ in period 3 with a probability of 1 ($B$ returns at least \$5000).

Am I correct? Any input appreciated and thanks.

Best Answer

You concern makes sense.Moreover I am sure that in general you are correct and and optimal on 1 period strategy does not have to be optimal on all period. But for this specific example it seems for me it is indeed true.

It seems for me that your calculations of the return for the second period are wrong. At least I don't understand how do you get these numbers. So let's calculate from the beginning. Let $E_n(C,X_{n,n+1...})$ be an expected return of your invested return since period $n$ to the end (n=3) with given capital $C$ and the strategy $X$.$X$ is the sequence of strategies. Let $E_n^1(C,X)$ be 1 period expected return. Let $B_n(X)$ be maximum of such return.Let $E_n^1(C,X)$ be 1 period expected return. (In both return we include the rest of capital if $C>5000$). Also let $R^1_n(C,X)$ be realized 1 period return. Then Obviously $$ B_n(C)=\max_{X} E_n^1(C,X)+E(B_{n+1}(R_n^1(C,X))) $$ Let's start from $B_3$ (you almost finihes this part) So $$ B_3(C)=\max((C-5000,0)+\min(C,5000)*7/5; $$ Assume now that $C\ge 5000$ and $D=C-5000$. Then $B_3(C)=D+7000$ Assume now that $C<5000$. Then $B_3(C)=C*7/5$; However $C$ is random on the 3d period. It depends on the previous capital and the chosen strategy at the previos period. Assume that at the begining of the previous period we have capital $C_2$. Then $$ R^1_2(C_2,A)=(\max(C_2-5000,0)+2\min(C_2,5000) \text { with } P=0.7\\ R^1_2(C_2,A)=\max(C_2-5000,0) \text { with }P=0.3 \\ R^1_2(C_2,B)=(\max(C_2-5000,0)+2\min(C_2,5000) \text { with } P=0.1\\ R^1_2(C_2,B)=\max(C_2-5000,0) +\min(C_2,5000)=C_2\text { with }P=0.9 \ $$ Thus $$ E (B_3(R^1_2(C_2,A)))=0.7\cdot B_3((\max(C_2-5000,0)+2\cdot\min(C_2,5000))+0.3B_3(\max(C_2-5000,0))\\ E (B_3(R^1_2(C_2,B)))=0.1\cdot B_3((\max(C_2-5000,0)+2\cdot\min(C_2,5000))+0.9(B_3(C_2)) $$ The difference $$ E (B_3(R^1_2(C_2,A)))-E (B_3(R^1_2(C_2,B))) $$ It is not so simple calculation but checking different intervals for $C_2$: $[0,2500],[2500,5000],[5000,10000],[10000,\infty)$ one can prove that it always positive. Thus and since $B_3(C)$ is strictly monotonic in $C$ then to get $B_2$ we have to maximize $E_n^1(C,X)$ ONLY. Thus $$ B_2(C)=(C-5000,0)+\min(C,5000)*7/5+0.7 B_3(\max((C-5000,0)+2\min(C,5000))+0.3 B_3(\max((C-5000,0) $$ Notice now that since $C_1=5000$ we can easily can calculate $B_2$ for all 3 possible outcomes from the period $1$ - $0,5000,1000$. $$ B_2(0)=0; B_2(5000)=7000+0.7(5000+7000)=15400; B_2(10000)=5000+7000+0.7(5000+10000)+0.3 7000=24600; $$ Thus $$ 0.7 B_2(10000)+0.3 B_2(0)=0.7\cdot 24600=17220\\ 0.1 B_2(10000)+0.9 B_2(5000)=0.1\cdot 24600+0.9\cdot 15400=16320 $$ So again $A$ is optimal on the first period too. And finally $$ B_1(5000)=7000+0.7 B_2(10000)+0.3 B_2(0)=24220; $$ REMARK In general, however for me is absolutely not obvious that the optimal strategy on 1 period should be optimal on all period. Moreover, I am sure I can find contr examples when it is better to switch the strategies.