[Math] Quasi-newton methods: SR1 and BFGS inverse update

numerical optimization

In Numerical Optimization by Nocedal and Wright, (http://home.agh.edu.pl/~pba/pdfdoc/Numerical_Optimization.pdf) Chapter 2 on unconstrained optimization, page 25 top, the authors claim that

"The equivalent formula for SR1, $$B_{k+1} = B_k + \frac{(y_k – B_k s_k) (y_k – B_k s_k)^T} {(y_k – B_k s_k)^T s_k}$$

and BFGS, $$B_{k+1} = B_k – \frac{B_ks_ks_k^T}{s_k^TB_ks_k} + \frac{y_ky_k^T}{y_k^Ts_k} $$

applied to the inverse approximation $$H_k = B_k^{-1} (definition) $$

is $$H_{k+1} = (I – \rho_k s_k y_k^T)H_k (I – \rho_k y_k s_k^T) + \rho_k s_k s_k^T$$ where $\rho_k = \frac{1}{y_k^T s_k}$"

I have expanded the inverse update given above to

$$H_{k+1} = H_k – \frac{H_ky_ks_k^T}{y_k^T s_k} – \frac{s_k y_k^T H_k}{y_k^T s_k} + \frac{s_k y_k^T H_k y_k s_k^T}{(y_k^T s_k)^2} + \frac{s_k s_k^T}{y_k^T s_k}$$

But from here, I cannot algebraically manipulate that into either of the non-inverse update formulas.

As well, I do not understand why, given that the SR1 and BFGS formulas are different updates with different guarantees, that the authors claim that they have the same inverse update.

===========================

EDIT: I still can't get the BFGS inverse update formula.
Here's my work:

I'm trying to do the updates separately since I cannot find a way to make the two rank-1 matrices form the product $UV^T$.

Using the higher order Sherman-Morrison formula,
$$B_k' = B_k – \frac{B_k s_k s_k^T B_k}{s_k^T B_k s_k}$$ choosing $$U=-\frac{1}{C} B_k s_k s_k^T$$ and $$V = B_k^T$$
gives

$${B'}_k^{-1} = \frac{1}{C} s_k s_k^T$$
$$B_{k+1} = B'_k + \frac{y_k y_k^T}{y_k^T s_k}$$ choosing $$U_2 = \frac{y_k}{K}$$ and $$V_2 = y_k$$

which gives
$$B_{k+1}^{-1} = -\frac{{B'}_k^{-1} y_k y_k^T {B'}_k^{-1}}{K} = -\frac{1}{K} (\frac{1}{C} s_k s_k^T) y_k y_k^T (\frac{1}{C} s_k s_k^T)$$ does not work out to give the update that I want.

Best Answer

You are right, the inverse approximation (2.21) seems to be for BFGS only. Compare with (6.17), page 140. The inverse approximation for SR1 is given in (6.25), page 144. To obtain inverses one applies the Sherman–Morrison formulae (A.27) and (A.28), page 612.

Edit:

Starting with the RHS of (6.19) and omitting subscript $k$ $$ B-\frac{Bss^TB}{s^TBs}+\frac{yy^T}{y^Ts}=\underbrace{B}_{A}+ \underbrace{\begin{bmatrix}Bs & y\end{bmatrix}}_{U} \underbrace{\begin{bmatrix}-\frac{1}{s^TBs} & 0\\0 & \frac{1}{y^Ts}\end{bmatrix} \begin{bmatrix}s^TB\\ y^T\end{bmatrix}}_{V^T}. $$ Then with $B^{-1}=H$ the inverse of the RHS is: \begin{align*} (A+UV^T)^{-1}&=H-H\begin{bmatrix}Bs & y\end{bmatrix} \left(I+\begin{bmatrix}-\frac{1}{s^TBs} & 0\\0 & \frac{1}{y^Ts}\end{bmatrix} \begin{bmatrix}s^TB\\ y^T\end{bmatrix}H\begin{bmatrix}Bs & y\end{bmatrix}\right)^{-1}\begin{bmatrix}-\frac{1}{s^TBs} & 0\\0 & \frac{1}{y^Ts}\end{bmatrix}\begin{bmatrix}s^TB\\ y^T\end{bmatrix}H\\ &=H-\begin{bmatrix}s & Hy\end{bmatrix} \left(\begin{bmatrix}-s^TBs & 0\\0 & y^Ts\end{bmatrix}+ \begin{bmatrix}s^TB\\ y^T\end{bmatrix}H\begin{bmatrix}Bs & y\end{bmatrix}\right)^{-1}\begin{bmatrix}s^T \\ y^TH\end{bmatrix}\\ &=H+\frac{1}{y^Tss^Ty}\left(\begin{bmatrix}s & Hy\end{bmatrix} \begin{bmatrix}y^T+y^Ty & -s^Ty\\-y^Ts & 0 \end{bmatrix}\begin{bmatrix}s^T \\ y^TH\end{bmatrix}\right)\\ &=H+\frac{1}{y^Tss^Ty}\left(sy^Tss^T+sy^THys^T-Hyy^Tss^T-ss^Ty^TH\right)\\ &=H+\left(1+\frac{y^THy}{y^Ts}\right)\frac{ss^T}{s^Ty}-\frac{Hys^T+(Hys^T)^T}{y^Ts}\\ \end{align*}

Related Question