[Math] Dynamic programming problem

dynamic programmingprobability

A man is in a room, with $n$ passages leading out. For passage $i$, $i = 1,…,n$, there is probability $p_i$ of escaping, $q_i$ of being killed and $r_i$ of returning to the room, where $p_i + q_i + r_i = 1$. In which order should the man try the passages so as to maximise his chances of eventual escape?

I am trying to formulate this as a dynamic programming problem, but am not sure how to associate costs with it. I am also not sure if I need to use discounting for the "death" scenario.

Any help would be greatly appreciated.

Best Answer

If you only have two passages, your chance of escape is $p_a+r_ap_b$ or $p_b+r_bp_a$.
Use $r_a=1-p_a-q_a$, this is $p_a+p_b-q_ap_b-p_ap_b$ or $p_a+p_b-p_ap_b-q_bp_a$.
So choose passage $A$ if $q_ap_b<p_aq_b$, or $p_a/q_a>p_b/q_b$.
This means that you should choose in decreasing order of $p_i/q_i$.

Related Question