Algorithms like policy iteration and value iteration are often classified as dynamic programming methods that try to solve the Bellman optimality equations.
My current understanding of dynamic programming is this:
- It is a method applied to optimization problems.
- DP problems exhibit optimal substructure, i.e., the optimal solution to a problem contains the optimal solution to a subproblem.
- These subproblems are not independent of each other, but are overlapping.
- There are two approaches – one is a bottom-up, the other is top-down.
I have the following questions:
-
Is this understanding of DP comprehensive? Does every DP algorithm have an optimal substructure with overlapping subproblems?
-
How does policy iteration and value iteration fit into this scheme? Can we call it bottom-up or top-down?
Best Answer
Have you seen Silver's lecture? Did you know Bellman coined the dynamic programming term, his first book was called "Dynamic programming" in 1957, see Wikipedia?
I think there is also a difference between "classical" DP, because in policy and value iteration we apply update steps iteratively until convergence. In DP updates can be done in one pass.