Dynamic programming and Bellman optimality principle

control theorydynamic programmingoptimal controloptimizationviscosity-solutions

Consider $$V(x)=\inf_{u \in \mathcal{U}}\int_{0}^{+\infty} e^{-\lambda t} f^0(y_x(t), u(t)) d t$$
which is the value function of an optimal control problem with $y_x(t)$ solution of the state equation
\begin{equation}\label{state_eq_finite_dim}
\left\{\begin{array}{l}
y^{\prime}(t)=f(y(t), u(t)), t>0 \\
y(0)=x \in \mathbb{R}^n
\end{array}\right.
\end{equation}

Then can I say that at an intuitive level $V$ should satisfy the following HJB
\begin{equation*}
\lambda v(x)-H(x, \nabla v(x))=0 \quad \text { in } \mathbb{R}^{n}
\end{equation*}

where
\begin{equation}
H(x, p)=\inf _{u \in U}\{f(x, u) \cdot p+f^0(x,u )\}
\end{equation}

By Bellman's optimality principle, we know that for every $t>0$ it holds that:
\begin{equation}
V(x)=\inf _{u \in \mathcal{U}}\left\{\int_{0}^{t} f^0\left(y_{x}(s), u(s)\right) e^{-\lambda s} d s+V\left(y_{x}(t)\right) e^{-\lambda t}\right\}
\end{equation}

since the lhs of Bellman's optimality principle is independent of $t$ while the rhs depends on $t$ and then we can differentiate formally the rhs and equating it to zero ?(in this way you get the HJB)

Best Answer

Too long to be a comment:

According to the book you mentioned in page 12, the last expression you wrote $h(t)=\int_{0}^{t} f^0\left(y_{x}(s), u(s)\right) e^{-\lambda s} d s+V\left(y_{x}(t)\right) e^{-\lambda t}$ in general will depend on $t$. However, this quantity should be constant regardless of $t$ for the optimal trajectory due to the dynamic programming principle. This is the equivalent to say that if the fastest route from L.A. to Boston passes through Chicago, then it is also the sequence of the fastest route from LA to Chicago and the fastest from Chicago to Boston. And the same goes for any other midpoint in the optimal route from L.A. to Boston. In your case, this roughly means that the for optimal trajectories the functional remains constant regardless of the midpoint $t$: the first term in $h(t)$ i.e. $\int_{0}^{t} f^0\left(y_{x}(s), u(s)\right) e^{-\lambda s} d s$ which is the cost up to $t$ and the second term i.e. $V\left(y_{x}(t)\right) e^{-\lambda t}$ which is the cost to go, are both optimal for any $t$. Thus, $h'(t)=0$ (since its constant) for the optimal route. Then, the nexts steps from the reference you gave lead to the HJB. Is this the part you wanted to be clarified? Please let me now.

Related Question