How the Hessian matrix of the robustified Gauss-Newton method for optimization is computed

approximationcalculushessian-matrixoptimizationparameter estimation

When robustifying the Gauss-Newton method used to find a local minima of a squared error function expressed by ($\Delta \mathbf{z}(\mathbf{x})$ is the error vector):
$$f(\mathbf{x})=\frac{1}{2}\Delta\mathbf{z}(\mathbf{x})^T\Delta\mathbf{z}(\mathbf{x})$$
as explained in works like Bundle adjustment: A Modern Synthesis (p. 17), we have to introduce a strong nonlinearity into this cost function, which is denoted by $\rho(\cdot)$ (it can be e.g. the Huber loss).

Thereby to reduce the influence of outliers, we are no longer interested in minimizing $f(\mathbf{x})$, but the robustified version of it: $\rho(f(\mathbf{x}))$.

With these considerations, the gradient vector ($\mathbf{g}$) and the approximate Hessian matrix ($\mathbf{H}$) given by the authors have the following form:
$$
\mathbf{g} = \rho'\mathbf{J}^T\Delta\mathbf{z},\quad\quad \mathbf{H} \approx \mathbf{J}^T(\rho'\,\mathbf{I} + 2\rho''\Delta\mathbf{z} (\Delta\mathbf{z})^T)\mathbf{J}
$$
Where $\rho'$ and $\rho''$ are the first and second derivatives of $\rho(f(\mathbf{x}))$ w.r.t $f(\mathbf{x})$. And $\mathbf{J}$ is the jacobian matrix given by: $\mathbf{J} = \partial \Delta\mathbf{z}/\partial \mathbf{x}$.

Given this, what I don't understand is how this approximation of the Hessian matrix is computed.

My attempt to reach it, is the following:
$$\begin{align}
\mathbf{H} &= \frac{\partial^2\rho(f(\mathbf{x}))}{\partial \mathbf{x}^2} = \frac{\partial\rho(f(\mathbf{x}))}{\partial f(\mathbf{x})}\frac{\partial^2 f(\mathbf{x})}{\partial \mathbf{x}^2} + \frac{\partial^2\rho(f(\mathbf{x}))}{\partial f(\mathbf{x})^2}\left(\frac{\partial f(\mathbf{x})}{\partial \mathbf{x}}\right)^2 && \text{By applying chain rule}
\end{align}
$$
Where:
$$\begin{align}
&\frac{\partial\rho(f(\mathbf{x}))}{\partial f(\mathbf{x})}\frac{\partial^2 f(\mathbf{x})}{\partial \mathbf{x}^2} \approx \rho'\mathbf{J}^T\mathbf{J}\quad\leftrightarrow\quad\text{Gauss-Newton approximation}\\
\\
&\frac{\partial^2\rho(f(\mathbf{x}))}{\partial f(\mathbf{x})^2}\left(\frac{\partial f(\mathbf{x})}{\partial \mathbf{x}}\right)^2 = \rho''\,\mathbf{J}^T\Delta\mathbf{z} (\Delta\mathbf{z})^T\mathbf{J}
\end{align}
$$
Thereby I am getting an approximated Hessian where the second term is not multplied by 2:
$$
\mathbf{H}_{\text{attempt}} \approx \mathbf{J}^T(\rho'\,\mathbf{I} + \rho''\Delta\mathbf{z} (\Delta\mathbf{z})^T)\mathbf{J} \neq \mathbf{J}^T(\rho'\,\mathbf{I} + \color{red}{2}\rho''\Delta\mathbf{z} (\Delta\mathbf{z})^T)\mathbf{J}
$$
Can please someone help me to see where I am doing the wrong step?

Thanks in advance!

Best Answer

You are right and the book is wrong.

We can try, e.g. $z(x) = x$, $\rho(y) = e^y$. Then the robustified objective is $e^{x^2/2}$, $J=1$, $\rho'[z(x)]=\rho''[z(x)]=\rho[z(x)]=e^{x^2/2}$, and if we just take the second derivative of the objective using 1D calculus,

$$g =x e^{x^2/2}$$ $$H =e^{x^2/2} + x^2 e^{x^2/2} $$ which has no factor of 2.

Best Answer

Related Solutions

[Math] How to solve the matrix minimization for BFGS update in Quasi-Newton optimization

Explanations:

[Math] Connection of gradient, Jacobian and Hessian in Newton’s method

Related Question