[Math] Derive Hessian inverse update using Sherman-Morrison in Quasi Newton Method

convex optimizationhessian-matrixlinear algebranumerical linear algebranumerical optimization

From Nocedal's Numerical Optimization book, the Hessian approximation equation is given using Symmetric Rank 1 (SR1) formula as follows:

$$B_{k+1} = B_k + \frac{(y_k – B_ks_k)(y_k – B_ks_k)}{(y_k – B_ks_k)^Ts_k}^T$$

where
$y_k = \nabla f(x_{k+1}) – \nabla f(x_k)$
and
$sk = x_{k+1} – x_k$

It further states that using Sherman-Morrison formula (Consider $\bar A = A + ab^T$, then $A^{-1} = \bar A^{-1} – \frac {A^{-1}ab^TA^{-1}}{1+b^TA^{-1}a}$ for $\bar A$ matrix)

The equation of inverse of Hessian can be derived using the above formula as follows:

$$H_{k+1} = H_k + \frac{(s_k – H_ky_k)(s_k – H_ky_k)^T}{(s_k – H_ky_k)y_k}$$

The problem is that to me that derivation is not obvious and I've tried solving this algebrically but I'm unable to get the derivation right from $H_{k+1}$ by manipulating $B_{k+1}$.

Best Answer

In Page 144 (chapter Quasi Newton methods) I saw what you referred to. But even in page 140 he used the Sherman inversion formula to go between the hessian update and the hessian inverse update. Anyway let me show how we go from $(6.24)$ to $(6.25)$.

excerpt from the Nocedal textbook

If you check the appendex you can see the formula,

from the appendix SM formula

First we need to write the rank one update of $B_k$,

$$ B_{k+1} = B_k + \alpha(y_k - B_ks_k) \times (y_k - B_k s_k)^T$$

Here $\alpha = \frac{1}{(y_k - B_ks_k)^Ts_k}$ is a scalar and easier to handle this way. Note that $a = \alpha(y_k - B_ks_k)$ and $b = (y_k - B_k s_k)$.

So now using $(A.27)$,

$$B_{k+1}^{-1} = H_{k+1} = B_k^{-1} - \frac{B_k^{-1}\alpha(y_k - B_ks_k)(y_k - B_k s_k)^T B_k^{-1}}{1 + (y_k - B_k s_k)^T B_k^{-1} \alpha(y_k - B_ks_k)} $$

Since $B_k^{-1} = H_k$, $(A+B)^T = A^T + B^T$ and both $H$ and $B$ are symmetric, $H^T = H$,

$$H_{k+1} = H_k - \frac{\alpha(H_ky_k - s_k)(y_k^TH_k^T - s_k^TB_kB_k^{-1})}{1 + \alpha(y_k^TH_k - s_k^TB_kB_k^{-1})(y_k - B_ks_k)} $$

$$H_{k+1} = H_k - \frac{( s_k - H_ky_k)( s_k^T - y_k^TH_k^T)}{\alpha^{-1} + (y_k^T H_ky_k - y_k^Ts_k - s_k^Ty_k + s_k^TB_ks_k)} $$

$$H_{k+1} = H_k - \frac{( s_k - H_ky_k)( s_k^T - (H_ky_k)^T)}{(y_k - B_ks_k)^Ts_k + (y_k^T H_ky_k - y_k^Ts_k - s_k^Ty_k + s_k^TB_ks_k)} $$

$$H_{k+1} = H_k + \frac{( s_k - H_ky_k)(s_k - H_ky_k)^T}{(s_k^Ty_k - y_k^T H_ky_k)} $$

Remember $a^Tb = b^Ta$,

$$H_{k+1} = H_k + \frac{( s_k - H_ky_k)(s_k - H_ky_k)^T}{(s_k^Ty_k - (H_ky_k)^Ty_k )} $$

$$H_{k+1} = H_k + \frac{( s_k - H_ky_k)(s_k - H_ky_k)^T}{(s_k - H_ky_k)^Ty_k} $$

Thus proved.

Related Question