Understanding the Gauss-Newton Method

jacobianleast squaresnonlinear optimization

I have successfully implemented the Gauss-Newton method to a simple nonlinear least-squares problem as shown in the Wikepedia page here. As I understand it, the method uses the derivative of the objective function, which is the sum of squares of the residuals, in order to find its root. Therefore, by knowing the root of the derivative, we also know where the minimum is (assuming we don't have any saddle points).

Hence my question is:

How do I check that the derivative of the objective function is zero at the optimum point in the Gauss-Newton method? Let me elaborate:

We have the following objective function below which we are looking to minimise for some optimum parameter $\beta$:

$$S(\beta)=\Sigma (y_i-f(x_i,\beta))^2$$

The Gauss-Newton iteration is therefore:

$$\beta^{(k+1)}=\beta^{(k)}+(\mathbf{J}^{\mathrm{T}}\mathbf{J})^{-1}\mathbf{J}^{\mathrm{T}}(\mathbf{y}-f(\mathbf{x},\beta^{(k)}))$$

By iterating the above several times, the optimum parameter vector $\beta_o$ is found such that $S(\beta_o)$ is minimum. Therefore, shouldn't the sum of all the rows in each column of $\mathbf{J}(\beta_o)$ be equal to zero (or at least a very small number when accounting for numerical errors)?

I have checked the above in my algorithm which converges correctly to the known solution, and it doesn't hold true. So for the sake of my improved understanding on this method I was wondering how to check that the derivative is zero at the optimum point in the Gauss-Newton method.

Thanks for your help in advance.

Best Answer

Let's comment on the algorithm in general, with the notation of your Wikipedia link. Since $J_{ij}=\frac{\partial r_i}{\partial\beta_j}$,$$\frac{\partial}{\partial\beta_j}\sum_ir_i^2=2\sum_iJ_{ij}r_i=2(J^Tr)_j,$$or in vector notation $\nabla\sum_ir_i^2=2J^Tr$. While minimising this sum of squares obtains $J^Tr=0$, it doesn't require entries of $J$ to be especially small. (In the special case where all $r_i$ have the same nonzero value, $\sum_iJ_{ij}=0$ for each $j$, i.e. each column sums to $0$.) So if you want a sanity check, compute $J^Tr$ at the algorithm's estimate of $\beta$.

Related Question