I am interested in deriving the unique solution to the following Quasi-Newton minimization problem
$$
\min_{H}|| H-H_k||\\
H=H^\top,\quad Hy_k =s_k
$$ The norm is the weighted Frobenius norm
$$
||A||_W \equiv ||W^{1/2}A W^{1/2}||_F
$$where the weight matrix $W$ is any positive definite matrix that satisfies $Ws_k=y_k$, and $||\cdot||_F$ is defined by $||C||^2_F= \sum_{i,j}^n c^2_{ij}$. The quantity $H$ is the inverse hessian which is symmetric, positive definite and satisfies the secant equation above, $Hy_k=s_k$. We can assume that $W=G^{-1}_k $ where $G_k$ the average hessian is given by
$$
G_k= \int_0^1 \nabla^2 f(x_k+\tau \alpha_k p_k)d\tau
$$
The unique solution is given by
$$
H_{k+1} = (1-\rho_k s_k y^\top_k)H_k (1-\rho_k y_k s^\top_k)+ \rho_k s_k s^\top_k
$$ where $\rho_k = (y^\top_k s_k)^{-1}$. Note, this is an iterative scheme where $k$ represents the current iteration and $H_{k+1}$ is an approximation to the inverse hessian.
The notation I am using is directly from the book Nocedal & Wright – Numerical Optimization. I am not able to find a full derivation of this anywhere, everything written above is all that Nocedal/Wright has in regards to this topic. For a reference, this is in chapter 6 – Quasi Newton Methods, of their newest/2nd edition.
All links I have tried to google and other books also have no full derivation. I am not able to find anything more thorough than Nocedal & Wright however they still don't have a derivation. Thanks
Best Answer
I'll outline the major steps. See the explanations and answers to the comments below.
Explanations:
Step 1. The convenience for $\hat H$ and $\hat H_k$ comes directly from the problem $$ \min\|\underbrace{W^{1/2}H_kW^{1/2}}_{\hat H_k}-\underbrace{W^{1/2}HW^{1/2}}_{\hat H}\|_F. $$ Then we have to rewrite the data too \begin{align} Hy_k=s_k\quad&\Leftrightarrow\quad \underbrace{\color{blue}{W^{1/2}}H\color{red}{W^{1/2}}}_{\hat H}\underbrace{\color{red}{W^{-1/2}}y_k}_{\hat y_k}=\underbrace{\color{blue}{W^{1/2}}s_k}_{\hat s_k},\\ Ws_k=y_k\quad&\Leftrightarrow\quad \underbrace{\color{red}{W^{-1/2}}Ws_k}_{\hat s_k}=\underbrace{\color{red}{W^{-1/2}}y_k}_{\hat y_k}. \end{align} Thus, $\hat H\hat y_k=\hat y_k$. It is also equal to $\hat s_k$.
Step 2. Since $\hat Hu=u$ we know that $u^T\hat Hu=u^Tu=1$ and $u_\bot^THu=u_\bot^Tu=0$, so we can represent the optimizing variable as $$ U^T\hat HU=\begin{bmatrix}u^T\\ u_\bot^T\end{bmatrix}\hat H\begin{bmatrix}u & u_\bot\end{bmatrix}= \begin{bmatrix}u^T\hat Hu & u^T\hat Hu_\bot\\u_\bot^T\hat Hu & u_\bot^T\hat Hu_\bot\end{bmatrix}= \begin{bmatrix}1 & 0\\0 & u_\bot^T\hat Hu_\bot\end{bmatrix}. $$ It gives the following \begin{align} U^T\hat H_kU-U^T\hat HU&=\begin{bmatrix}u^T\\ u_\bot^T\end{bmatrix}\hat H_k\begin{bmatrix}u & u_\bot\end{bmatrix}-\begin{bmatrix}u^T\\ u_\bot^T\end{bmatrix}\hat H\begin{bmatrix}u & u_\bot\end{bmatrix}=\\ &=\begin{bmatrix}\color{blue}{u^T\hat H_ku} & \color{blue}{u^T\hat H_ku_\bot}\\\color{blue}{u_\bot^T\hat H_ku} & \color{red}{u_\bot^T\hat H_ku_\bot}\end{bmatrix}-\begin{bmatrix}\color{blue}{1} & \color{blue}{0}\\\color{blue}{0} & \color{red}{u_\bot^T\hat Hu_\bot}\end{bmatrix}. \end{align} This particular structure of the optimizing variable was the whole idea to switch to the new basis. Because $\hat H$ has no freedom to vary in the blue part, we cannot change the corresponding blue part of $\hat H_k$, so it is fixed for all possible $\hat H$. The red part though can be changed as we wish, and the smallest Frobenius norm \begin{align} \|U^T(\hat H_k-\hat H)U\|_F^2&= \left\|\begin{bmatrix}\color{blue}{u^T\hat H_ku-1} & \color{blue}{u^T\hat H_ku_\bot}\\\color{blue}{u_\bot^T\hat H_ku} & \color{red}{u_\bot^T\hat H_ku_\bot-u_\bot^T\hat Hu_\bot}\end{bmatrix}\right\|_F^2=\\ &=\color{blue}{(u^T\hat H_ku-1)^2+\|u^T\hat H_ku_\bot\|_F^2+\|u_\bot^T\hat H_ku\|_F^2}+\color{red}{\|u_\bot^T\hat H_ku_\bot-u_\bot^T\hat Hu_\bot\|_F^2} \end{align} is obtained when the red part is zero.
Step 3. The matrix $U$ is orthogonal, hence, $$ I=UU^T=\begin{bmatrix}u & u_\bot\end{bmatrix}\begin{bmatrix}u^T \\ u_\bot^T\end{bmatrix}=uu^T+u_\bot u_\bot^T\quad\Leftrightarrow\quad u_\bot u_\bot^T=I-uu^T. $$