How to derive Bellman’s principle of optimality in discrete-time

control theorydynamic programmingoptimal controloptimization

In the context of discrete-time optimal control theory, Bellman's principle of optimality is useful for efficiently determining the control signal $\{u_k\}_{k=0}^{N-1}$ that minimizes the following objective function
$$
J\left(x_0,\{u_k\}_{k=0}^{N-1}\right) = \sum_{k=0}^{N-1} g_k (x_k,u_k) + g_N(x_N)
$$
where $\{x_k\}_{k=0}^{N}$ is the trajectory of the state vector $x_k$, and $\{g_k\}_{k=0}^{N}$ are functions that are chosen such that $J\left(x_0,\{u_k\}_{k=0}^{N-1}\right)$ has a unique global minimum with respect to $\{u_k\}_{k=0}^{N-1}$. Moreover, $x_k$ evolves according to the difference equation $$x_{k+1} = f(x_k,u_k)$$ Bellman's principle of optimality states:

An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.

How can I relate this principle to the minimization of $J\left(x_0,\{u_k\}_{k=0}^{N-1}\right)$ with respect to the control signal $\{u_k\}_{k=0}^{N-1}$?

Best Answer

$\newcommand{bal}{\begin{align}}\newcommand{eal}{\end{align}}$Bellman's principle of optimality becomes apparent when the following optimization problem $$ \min_{u_0,u_1,\dots,u_{N-1}} J\left(x_0,\{u_k\}_{k=0}^{N-1}\right) $$ is decomposed into smaller sub-problems as follows. First, because $$ \min_{u_0,u_1,\dots,u_{N-1}} J\left(x_0,\{u_k\}_{k=0}^{N-1}\right) = \min_{u_0} \left[ \min_{u_1} \left[ \min_{u_2} \left[ \cdots \left[ \min_{u_{N-1}} J\left(x_0,\{u_k\}_{k=0}^{N-1}\right) \right]\right]\right]\right] \tag{1} \label{eq:decomp} $$ then we can re-write the optimization problem as \begin{align} \min_{u_0} \left[ \min_{u_1} \left[ \min_{u_2} \left[ \cdots \left[ \min_{u_{N-1}} \sum_{k=0}^{N-1} g_k (x_k,u_k) + g_N(x_N) \right]\right]\right]\right] \end{align} We can expand the inner summation to get $$ \min_{u_0} \left[\min_{u_1} \left[\min_{u_2} \left[\cdots \left[\min_{u_{N-1}} g_0 (x_0,u_0) + g_1 (x_1,u_1) + g_2 (x_2,u_2) + \cdots + g_{N-1} (x_{N-1},u_{N-1}) + g_N(x_N) \right]\right]\right]\right] $$ Because some of the terms in the objective function are constant with respect to the $\min$ operators, then we can take them out as follows (I've broken up the expression over multiple lines for clarity, as it gets too long to fit on one line) $$ \begin{align} &\min_{u_0} g_0 (x_0,u_0) \ + \\ &\Big[\min_{u_1} g_1 (x_1,u_1) \ + \\ &\Big[\min_{u_2} g_2 (x_2,u_2) \ + \\ &\Big[\cdots \\ &\Big[\min_{u_{N-2}} g_{N-2}(x_{N-2},u_{N-2}) \ + \\ &\Big[\min_{u_{N-1}} g_{N-1} (x_{N-1},u_{N-1}) \ + \\ &\Big[g_N(x_N)\Big]\Big]\Big]\Big]\Big]\Big] \end{align} $$ We then let $J_N(x_N) = g_N(x_N)$, which represents the terminal cost of being in the final state $x_N$. Using this substitution for $g_N(x_N)$, the optimization problem becomes $$ \begin{align} &\min_{u_0} g_0 (x_0,u_0) \ + \\ &\Big[\min_{u_1} g_1 (x_1,u_1) \ + \\ &\Big[\min_{u_2} g_2 (x_2,u_2) \ + \\ &\Big[\cdots \\ &\Big[\min_{u_{N-2}} g_{N-2}(x_{N-2},u_{N-2}) \ + \\ &\Big[\min_{u_{N-1}} g_{N-1} (x_{N-1},u_{N-1}) + J_N(x_N)\Big]\Big]\Big]\Big]\Big] \end{align} $$ Next, we let $$ \bal J_{N-1}(x_{N-1}) &= \min_{u_{N-1}} g_{N-1} (x_{N-1},u_{N-1}) + J_N(x_N) \eal $$ Because $x_N = f\left(x_{N-1},u_{N-1}\right)$, then $$ \bal J_{N-1}(x_{N-1}) &= \min_{u_{N-1}} g_{N-1} (x_{N-1},u_{N-1}) + J_N\left(f\left(x_{N-1},u_{N-1}\right)\right) \eal $$ Note that $J_{N-1}(x_{N-1})$ represents the minimal cost with respect to $u_{N-1}$ of a one-stage process with initial condition $x_{N-1}$. Using this substitution, the optimization problem becomes $$ \begin{align} &\min_{u_0} g_0 (x_0,u_0) \ + \\ &\Big[\min_{u_1} g_1 (x_1,u_1) \ + \\ &\Big[\min_{u_2} g_2 (x_2,u_2) \ + \\ &\Big[\cdots \\ &\Big[\min_{u_{N-2}} g_{N-2}(x_{N-2},u_{N-2}) + J_{N-1}(x_{N-1})\Big]\Big]\Big]\Big] \end{align} \tag{2} \label{eq:opt_prob} $$ To see where Bellman's principle of optimality comes from, we just need to perform one last substitution: \begin{align} J_{N-2}(x_{N-2}) &= \min_{u_{N-2}} g_{N-2}(x_{N-2},u_{N-2}) + J_{N-1}(x_{N-1}) \\ &= \min_{u_{N-2}} g_{N-2}(x_{N-2},u_{N-2}) + J_{N-1}\left(f\left(x_{N-2},u_{N-2}\right)\right) \end{align} Here, $J_{N-2}(x_{N-2})$ represents the minimal cost with respect to $u_{N-2}$ of a two-stage process with initial condition $x_{N-2}$. In this context, Bellman's principle of optimality states that, if $J_{N-2}(x_{N-2})$ is minimized with respect to $u_{N-2}$ and $u_{N-1}$, then, regardless of the initial condition $x_{N-2}$ and the chosen control decision $u_{N-2}$, $J_{N-1}(x_{N-1})$ must be minimized with respect to $u_{N-1}$. To see why this must be true, we expand the $J_{N-1}(x_{N-1})$ term in the expression above for $J_{N-2}(x_{N-2})$ to get \begin{align} J_{N-2}(x_{N-2}) &= \min_{u_{N-2}} g_{N-2}(x_{N-2},u_{N-2}) + J_{N-1}(x_{N-1}) \\ &= \min_{u_{N-2}} \left[ g_{N-2}(x_{N-2},u_{N-2}) + \left[\min_{u_{N-1}} g_{N-1} (x_{N-1},u_{N-1}) + J_N\left(f\left(x_{N-1},u_{N-1}\right)\right)\right]\right] \tag{3} \label{eq:bellman_part2} \\ &= \min_{u_{N-2},u_{N-1}} g_{N-2}(x_{N-2},u_{N-2}) + g_{N-1} (x_{N-1},u_{N-1}) + J_N\left(f\left(x_{N-1},u_{N-1}\right)\right) \tag{4} \label{eq:bellman_part1} \end{align} We can see from \eqref{eq:bellman_part1} that $J_{N-2}(x_{N-2})$ is indeed minimized with respect to $u_{N-2}$ and $u_{N-1}$. Additionally, from \eqref{eq:bellman_part2}, we see that $J_{N-1}(x_{N-1})$ is minimized with respect to $u_{N-1}$, independent of $x_{N-2}$ and $u_{N-2}$. In essence, Bellman's principle of optimality is a result of the property described in \eqref{eq:decomp} and the structure of the objective function $J\left(x_0,\{u_k\}_{k=0}^{N-1}\right)$.

One other important concept to be aware of is that $J_{N-m}(x_{N-m})$ represents the minimal cost of an $m$-stage process with initial condition $x_{N-m}$ and the minimal cost for the final $m$ stages of an $N$-stage process with state at at time-step $k = N-m$ of $x_{N-m}$. In other words, as long as $N \geq m$, then every $m$-stage process is embedded inside an $N$-stage process. This means that, if $\{u_k^*\}_{k=0}^{N-1}$ are the optimal controls for the $N$-stage process with initial condition $x_0$, then $\{u_k^*\}_{k=N-m}^{N-1}$ are the optimal controls for the $m$-stage process with initial condition $x_{N-m}$. This is another way of stating Bellman's principle of optimality.

Going back to the optimization problem in \eqref{eq:opt_prob}, if we proceed with the rest of the substitutions until $k=0$, we get the following recurrence relation $$J_{k}(x_k) = \min_{u_k} \left[g_k(x_k,u_k) + J_{k+1}(x_{k+1})\right]$$ for $k = 0,1,\dots,N-1$ and the boundary condition $J_N(x_N) = g_N(x_N)$, which can be solved recursively starting from the boundary condition.

Best Answer

Related Solutions

[Math] Cost-to-go form of Dynamic Programming algorithm

Dynamic programming and Bellman optimality principle

Related Question