Derivation of Gauss-Newton method for underdetermined inverse problems

numerical methodsnumerical optimization

I would like to derive the Gauss-Newton method for solving underdetermined inverse problems, i.e., where the parameter space is larger than the number of constraints. Although my final solution works, I do not understand how the method can be derived for such underdetermined problems:

Let $f$ be a scalar objective function $f:\mathbb{R}^n \rightarrow \mathbb{R}$ that we wish to minimize using the Gauss-Newton algorithm, i.e.,
$$f=\sum_{i=1}^m r_i^2(\underline{\alpha}),$$
where $\underline{\alpha} \in \mathbb{R}^n$ is a vector of $n$ parameters, $r_i$ describes the i-th constraint (residual) and $m$ is the number of constraints. To the best of my knowledge and according to Wikipedia , the Gauss-newton algorithm can be derived directly from Newton's method, i.e.,
$$\underline{\alpha}^{\nu+1} = \underline{\alpha}^{\nu} – \underline{\underline{H}}^{-1}\underline{g},\tag{1}\label{eq1}$$
where $\underline{g}$ and $\underline{\underline{H}}$ are the gradient vector and Hessian matrix of $f$. The gradient vector in index notation is given as
$$g_j=2\sum_{i=1}^m r_i \frac{\partial r_i}{\partial \alpha_j}$$
and the Hessian is
$$H_{jk}=2\sum_{i=1}^m\left(\frac{\partial r_i}{\partial \alpha_j} \frac{\partial r_i}{\partial \alpha_k} +r_i \frac{\partial^2 r_i}{\partial \alpha_j \partial \alpha_k}\right).$$
In order to obtain the Gauss-Newton method, we ignore the second-order derivatives of the Hessian. Thus,
$$\underline{g} = 2 \underline{\underline{J}}_r^T \underline{r}$$
and
$$\underline{\underline{H}} \approx 2 \underline{\underline{J}}_r^T \underline{\underline{J}}_r,$$
where $\underline{\underline{J}}_r$ is the Jacobian of the residual vector $\underline{r}\in\mathbb{R}^m$. With this, we can now write the Gauss-Newton algorithm, i.e.,
$$\underline{\alpha}^{\nu+1} = \underline{\alpha}^{\nu} – \left (\underline{\underline{J}}_r^T \underline{\underline{J}}_r \right)^{-1} \underline{\underline{J}}_r^T \underline{r}.\tag{2}\label{eq2}$$
For the usual case with $m>n$ (overdetermined problem), this algorithm works perfectly fine for me. However, if $m<n$ (underdetermined inverse problem), $\left (\underline{\underline{J}}_r^T \underline{\underline{J}}_r \right)$ becomes singular and cannot be inverted. I realized that I can overcome this problem if I replace the left pseudoinverse of Equation \ref{eq2} by the right pseudoinverse, i.e.,
$$\underline{\alpha}^{\nu+1} = \underline{\alpha}^{\nu} – \underline{\underline{J}}_r^T\left (\underline{\underline{J}}_r \underline{\underline{J}}^T_r \right)^{-1} \underline{r}\tag{3}\label{eq3}.$$
With this modification, my algorithm works perfectly fine and converges to a solution. However, I do not understand which assumptions are necessary to derive Equation \ref{eq3}, e.g. similarly to what was done above for Equation \ref{eq2} based on Newton's method (\ref{eq1}). I assume that in my derivation, I implicitly assume $m>n$, but I do not know where.

Can anyone help me with this?

Best Answer

You have a system of equations $$g(x) = 0$$ where $$g : \mathbb{R}^m \rightarrow \mathbb{R}^n$$ and $$n \leq m$$ so that there are at least as many unknowns as there are constraints. By Taylor's formula we have $$g(x+ \Delta x) \approx g(x) + Dg(x) \Delta x$$ where $$Dg(x) \in \mathbb{R}^{n \times m}$$ is the Jacobian of $g$ at the point $x \in \mathbb{R}^m$. We observe that $Dg(x)$ is a wide matrix with at least as many columns as rows. In order to derive Newton's method we seek $\Delta x \in \mathbb{R}^m$ such that $$g(x) + Dg(x) \Delta x = 0.$$ Suppose that there exists such a $\Delta x$, then there are unique $$y \in \text{Ker}(Dg(x)), \quad z \in \text{Ran}(Dg(x)^T)$$ such that $$\Delta x = y + z.$$ Here $\text{Ker}(A)$ denotes the kernel or null space of the linear transformation $A$ and $\text{Ran}(A)$ denotes the range or column space of the linear transformation $A$. It is clear that $y$ is irrelevant since $$ g(x) + Dg(x)\Delta x = g(x) + Dg(x) z.$$ Now if $z \in \text{Ran}(Dg(x)^T)$, then $$ z = Dg(x)^T p$$ for some $p \in \mathbb{R}^n$. It follows that we should consider the equation $$ g(x) + Dg(x) Dg(x)^T p = 0$$ If we assume that $Dg(x)$ has full row rank, then $$ Dg(x) Dg(x)^T \in \mathbb{R}^{n \times n} $$ is nonsingular and our equation has a unique solution $p$. It follows Newton's method takes the form $$ x \gets x - Dg(x)^T(Dg(x)Dg(x)^T)^{-1} g(x).$$

Related Question