[Math] Cost-to-go form of Dynamic Programming algorithm

dynamic programmingoperations researchterminology

My lecture of Mat-2.3148 (Finnish) defines dynamic-programming-algorithm so that$J_N(x_N)=g_N(x_N)$ and $J_k(x_k)=\min_{u_k}\left\{g_k(x_k,u_k)+J_{k+1}(f_k(x_k,u_k))\right\}$ where

  • the state $x_k$
  • the state-equation $x_{k+1}=f_k(x_k,u_k)$
  • control $u_k$
  • cost/benefit function $g_k(x_k,u_k)$
  • border-conditions ($x_0$, $x_N$) and
  • target function $J_k(x_k)=\sum g_k(x_k,u_k)$.

Now the next slide uses the Cost-to-go. Suppose discounted a $N$-time-horizon problem so that $J_{N-k}(x)=\min_u\left\{\alpha^{N-k}g(x,u)+J_{N-k+1}(f(x,u))\right\}$ where $k=1,…,N$ and initial condition $J_N(x)=\alpha^N J(x)$. Now the the cost-go-form is

$$V_{k+1}(x)=\min_u\left\{g(x,u)+\alpha V_k(f(x,u))\right\}$$

where the initial condition $V_0(x)=J_N(x)$ and $V_k(x)$ is the optimal cost aka value function when you begin from the state $x$ and with $k$ steps to go.

What is this cost-to-go algorithm? Bounded discounted possibly-infinite-time-horizon problem where we use something-called "cost-to-go" notation?

Best Answer

The DP algorithm is a discrete-time algorithm. Its generalization to continuous time is the Hamilton-Jacobi-Bellmann equation, its derivation from the discrete time on the page 91 of the Bertsimas DP (1995). It is not always possible to execute the DP algorithm due to excessive computational requirements similarly for the HJB. The cost-to-go is an umbrella term used in both contexts such as the DP algorithm and the HJB. Its intuitive meaning is the next step of the recursion, this is my understanding!

We have the cost-function discounted with the value-function. If $α∈(0,1)$, we have a nicely-behaving thing because the geometric-series converging (Betrsimas DP p.308). The $α$ can be considered as a probability to move so the $α^NJ(x)$ would mean the expected cost after N steps.

Related Question