Gauss-Newton normal equations with norm of residual

computer visionleast squareslinear algebranonlinear optimization

The Wiki definition of Gauss-Newton has the following scalar cost function:

${\displaystyle S({\boldsymbol {\beta }})=\sum _{i=1}^{m}r_{i}^{2}({\boldsymbol {\beta }}).}$

where $r_i(\beta)$ are scalar functions of parameters $\beta$.

In this document, the residuals $r_i$ are defined as

${\displaystyle r_{i}({\boldsymbol {\beta }})=y_{i}-f\left(x_{i},{\boldsymbol {\beta }}\right).}$

This assumes that $y_i$ and $f(x_i, \beta)$ are both scalar.

But in some application domains, say Bundle Adjustment, $y_i, f(x_i, \beta) \in \mathbb{R}^2$ as they represent 2D points in a sensor.

Therefore, $r_i(\beta) = ||y_i – f(x_i, \beta)||$.

Can we still use the same GNA normal equations to solve such a problem when $r_i$ is wrapped in some kind of a distance metric like the Euclidean norm?

Thanks!

Best Answer

Suppose the functions $f_i : \mathbb R^k \to \mathbb R^p$ for $i = 1, \ldots, m$ are differentiable and the function $S:\mathbb R^k \to \mathbb R$ is defined by $$ \tag{1} S(\beta) = \sum_{i=1}^m \| f_i(\beta) - y_i \|^2. $$ Our goal is to minimize $S$.

The way the Gauss-Newton method works is that at each iteration we linearize the functions $f_i$ and then solve the resulting least squares problem. Let $\beta^t$ be our estimate of $\beta$ at iteration $t$ of the Gauss-Newton method. (The $t$ is a superscript, not an exponent.) Then at iteration $t+1$ of the Gauss-Newton method we solve the least squares problem $$ \text{minimize} \quad \sum_{i=1}^m \| f_i(\beta^t) + \underbrace{f_i'(\beta^t)}_{p \times k} (\beta - \beta^t) - y_i \|^2. $$ The optimization variable is $\beta \in \mathbb R^k$. To solve this least squares problem, we can set the gradient equal to $0$, which yields $$ \sum_{i=1}^m 2 \underbrace{f_i'(\beta^t)^T}_{k \times p} \big(f_i(\beta^t) + \underbrace{f_i'(\beta^t)}_{p \times k} (\beta - \beta^t) - y_i \big) = 0. $$ The solution to this linear system is $\beta^{t+1}$.


A slightly different (but equivalent) way to look at this is to define $$ f(\beta) = \begin{bmatrix} f_1(\beta) \\ f_2(\beta) \\ \vdots \\ f_m(\beta) \end{bmatrix} \quad \text{and} \quad y = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{bmatrix} $$ and note that equation (1) can be written equivalently as $$ S(\beta) = \| f(\beta) - y \|^2. $$ We can use the Gauss-Newton method to minimize $\| f(\beta) - y \|^2$. At each iteration of the Gauss-Newton method, we linearize $f$ and then solve the resulting least squares problem. Let $\beta^t$ be our estimate of $\beta$ at iteration $t$ of the Gauss-Newton method. The least squares problem that we solve at iteration $t+1$ is $$ \text{minimize} \quad \| f(\beta^t) + f'(\beta^t)(\beta - \beta^t) - y \|^2. $$ The optimization variable is $\beta \in \mathbb R^k$. We can solve this least squares problem by setting the gradient equal to $0$, which yields $$ f'(\beta^t)^T \bigg( f(\beta^t) + f'(\beta^t)(\underbrace{\beta - \beta^t}_{\Delta \beta}) - y \bigg) = 0 $$ or equivalently $$ \tag{2} f'(\beta^t)^T f'(\beta^t) \Delta \beta = f'(\beta^t)^T(y - f(\beta^t)). $$ Our new estimate of $\beta$ is $$ \beta^{t+1} = \beta^t + \Delta \beta $$ where $\Delta \beta$ is computed by solving the "normal equations" (2).

Related Question