Optimality — Hamilton-Jacobi-Bellman (HJB) versus Riccati

control theoryhamilton-jacobi-equationoptimal controlpartial differential equations

Most of the literature on optimal control discuss Hamilton-Jacobi-Bellman (HJB) equations for optimality. In dynamics however, Riccati equations are used instead. Jacobi Bellman equations are also used in Reinforcement learning.

Are there any comparisons or parallels between the two class of equations as far as optimality is concerned? How do they differ?

Best Answer

My simple answer would be that they are quite the same thing.

Explanation

Riccati Equations can be derived from Hamilton-Jacobi-Bellman equations in the particular case of LQR problem, an optimal control problem where the dynamics is linear and the cost is quadratic.

Consider the finite horizon LQR problem. In this particular case, the Hamilton-Jacobi-Bellman equation has the expression (for semplicity I put $N=0$)

$$ \partial_t V(x,t) + \min_u \left\{ \partial_x V(x,t) \cdot (Ax+Bu) + x^T Q x + u^T Ru \right\} = 0 $$ with the terminal condition $$ V(x,T) = x^T Q_f x . $$

Now we look for solutions of the form $V(x,t) = x^T P(t) x$, where $P(t)$ is a symmetric matrix for each $t \in [0,T]$. If we substitute this expression in the (HJB) equation, we get

$$ x^T P'(t) x + \min_u \left\{ 2 P(t)x \cdot (Ax+Bu) + x^T Q x + u^T Ru \right\} = 0 . $$

We can explicitly find the minimum of the expression inside the curly brackets. For a given couple $(t,x)$, let us define $\Phi$ as $$ \Phi(u) = 2 P(t)x \cdot (Ax+Bu) + x^T Q x + u^T Ru .$$

The minimum is obtained when $∇\Phi(u) = 0$, that is when $$ 2B^T P(t) x + 2 Ru = 0,$$ so the optimal control is $$ u^*(t,x) = -R^{-1} B^T P(t)x ,$$ with $$ \Phi(u^*(t,x)) = 2 x^T P(t)^T \left(Ax-BR^{-1} B^T P(t)x \right) + x^T Q x + x^T P(t)^T B R^{-1} B^T P(t)x $$

So we can rewrite again the (HJB) equation without the minimization term: $$ x^T P'(t) x + 2 x^T P(t)^T \left(Ax-BR^{-1} B^T P(t)x \right) + x^T Q x + x^T P(t)^T B R^{-1} B^T P(t)x = 0 , $$ and by grouping the $x^T$ and the $x$ term and doing some simple algebraic steps ($P(t)$ is symmetric) we get $$ x^T \left( P'(t) + 2 P(t) A - P(t) BR^{-1} B^T P(t) + Q \right) x = 0 , $$

Since the above equation must hold for each $x$, it is equivalent to the matrix differential equation

$$ P'(t) + 2 P(t) A - P(t) BR^{-1} B^T P(t) + Q = 0 .$$

Finally, in order to satisfy the final condition, it must be $$ P(T) = Q_f .$$

Comments:

This is not a rigorous proof that the two equations are equivalent, but it shows that they are quite the same thing.
I couldn't get the term $A^TP(t) + P(t)A$ of the Riccati equations, instead I found $2P(t) A$.

Best Answer

Related Solutions

Numerical solution of HJB (Hamilton-Jacobi-Bellman equations) in practice

Role of verification theorems in stochastic optimal control

Related Question