[Math] Gauss-Newton Method not converging for the function

jacobianMATLABnewton raphsonnonlinear optimizationoptimization

I need to solve the optimization problem $$ \min_{x\in \mathbb{R}^{3}}f(x) $$ where the function $f$ is defined as follows:
$$ f(x_{1}, x_{2}, x_{3}) = \frac{1}{2}\left[\left(2x_{1}-x_{2}x_{3}-1 \right)^{2}+\left(1-x_{1}+x_{2}-e^{x_{1}-x_{3}}\right)^{2}+\left( -x_{1}-2x_{2}+3x_{3}\right)^{2} \right]$$
using Matlab code for Gauss-Newton's Method.

I already know that the optimal point is $\hat{x}=(1,1,1)$, and my instructions by my professor were to alter my code for Newton's Method by using the Jacobian approximation of the Hessian (i.e, using $\displaystyle H(j,k)\approx 2 \sum_{i=1}^{3}\left[ J(i,j)*J(i,k)\right]$).

After doing this, and running my code for the starting points $(0,0,0)$, $(1.5,0.5,0.5)$, $(2,2,1)$, and $(-1.5,0.5,-0.5)$, I find that the only points it even appears to converge for after 100 iterations are $(0,0,0)$ (appears to converge after 11 iterations, but the algorithm doesn't stop running after that like it did for Newton's Method where it also converged early) and for $(-1.5,0.5,-0.5)$ (appears to converge after 19 iterations, but again, the algorithm doesn't stop running).

These points are not necessarily closer to the optimal point of $(1,1,1)$ than the other starting points. So then, I went to look at the condition index of $J^{T}*J$, because I've read that a high condition index of this matrix could prevent Gauss-Newton from converging.

I calculated the Jacobian Matrix $J$ and its transpose $J^{T}$ at the optimal point of $\hat{x}=(1,1,1)$ and obtained $J = \begin{pmatrix} 9 & -2 & -7 \\ -2 & 6 & -4 \\ -7 & -4 & 11 \end{pmatrix}$, $J^{T} = \begin{pmatrix}9 & -2 & -7 \\ -2 & 6 & -4 \\ -7 & -4 & 11 \end{pmatrix}$ (So, $J$ is a symmetric matrix).

Therefore, $J^{T}*J = \begin{pmatrix} 134 & -2 & -132 \\ -2 & 56 & -54 \\ -132 & -54 & 186 \end{pmatrix}$.

Using Matlab, I calculated the three eigenvalues of this matrix to be $\lambda_{1} = -0.0000$ (I'm assuming this means it is something negative really, really close to zero?), $\lambda_{2} = 74.6686$, $\lambda_{3} = 301.3314$.

First, I calculated the condition index manually by taking the ratio of $\displaystyle \text{max eigenvalue}/\text{min eigenvalue} = \frac{301.3314}{-0.0000} \approx \frac{301.3314}{0}$, which is infinite.

Then, according to the matlab function cond(), the condition index of $J^{T}*J$ was given to be $6.4120e+16$, which is large, but not infinite.

Is this why Gauss-Newton is not converging for this problem? Is there another reason? Or is it suppose to converge, and am I just doing something wrong?

I thank you ahead of time for your time and patience, and very much look forward to learning how to figure out this problem.

Best Answer

As

$$ f(x) = f(x_0)+\frac{\partial f(x_0)}{\partial x}(x-x_0) + \frac 12(x-x_0)^{\dagger}\frac{\partial^2f(x_0)}{\partial^2x}(x-x_0) + O(|x-x_0|^3) $$

deriving again we have

$$ \nabla f(x)\approx\nabla f(x_0) + \frac{\partial^2f(x_0)}{\partial^2x}(x-x_0)\approx 0 $$

because $f(x) = g_1(x)^2+g_2(x)^2+g_3(x)^2$ so we get

$$ x_k = x_{k-1}-H_{k-1}^{-1}\nabla f(x_{k-1}) $$

with

$$ \nabla f(x) = \frac{\partial f(x_0)}{\partial x} = J\\ H = \frac{\partial^2f(x)}{\partial^2x}\approx J^{\dagger}\cdot J $$

NOTE

Calling $G(x) = \{g_1(x),g_2(x),\cdots,g_m(x)\}^{\dagger}$ and $f(x) = \frac 12G^{\dagger}(x)\cdot G(x)$ we have

$$ \nabla f(x) = J(x)^{\dagger}\cdot G(x) $$

and

$$ \nabla^2 f(x) = H(x) = \nabla J(x)^{\dagger}\cdot G(x) + J(x)^{\dagger}\cdot \nabla G(x) = J(x)^{\dagger}\cdot J(x) +\sum_kg_k(x)\nabla^2 g_k(x) $$

So we can evaluate the approximation

$$ H = \frac{\partial^2f(x)}{\partial^2x}\approx J(x)^{\dagger}\cdot J(x) $$

Attached a MATHEMATICA script implementing the Gauss-Newton iterative procedure

X = {x, y, z};
f[x_, y_, z_] := {{2 x - y z - 1}, {1 - x + y - Exp[x - z]}, {-x - 2 y + 3 z}}
g[x_, y_, z_] := D[f[x, y, z], {X}]
x0 = RandomReal[{-2, 2}];
y0 = RandomReal[{-2, 2}];
z0 = RandomReal[{-2, 2}];
X0 = {x0, y0, z0};
f0 = Flatten[f[x0, y0, z0], 1];
n = 20;
error = 0.0001;
path = {{0, x0, y0, z0, f0.f0}};

For[k = 1, k <= n, k++,
    Jf = Flatten[Transpose[g[x, y, z] /. Thread[X -> X0]], 1];
    JR = Transpose[Transpose[f[x, y, z]].Jf] /. Thread[X -> X0];
    Hf = Transpose[Jf].Jf;
    p = LinearSolve[Hf, -JR];
    If[Max[Abs[p]] < error, Break[]];
    X0 = Flatten[X0 + p];
    f0 = Flatten[f[x, y, x], 1] /. Thread[X -> X0];
    AppendTo[path, {k, x, y, z, f0.f0} /. Thread[X -> X0]]
]

with a typical outcome

$$ \left[ \begin{array}{ccccc} k & x & y & z & f^{\dagger}\cdot f\\ 0 & -0.18083 & -0.360527 & -1.54181 & 27.0262 \\ 1 & 0.419264 & -0.249572 & -0.0266268 & 2.23994 \\ 2 & 0.441999 & 0.376084 & 0.398056 & 0.101378 \\ 3 & 0.693323 & 0.692737 & 0.692933 & 0.00877097 \\ 4 & 0.846417 & 0.846417 & 0.846417 & 0.000556373 \\ 5 & 0.923209 & 0.923209 & 0.923209 & 0.0000347735 \\ 6 & 0.961604 & 0.961604 & 0.961604 & \text{2.1733}\cdot 10^{-6} \\ 7 & 0.980802 & 0.980802 & 0.980802 & \text{1.3583}\cdot 10^{-7} \\ 8 & 0.990401 & 0.990401 & 0.990401 & \text{8.4896}\cdot 10^{-9} \\ 9 & 0.995201 & 0.995201 & 0.995201 & \text{5.3060}\cdot 10^{-10} \\ 10 & 0.9976 & 0.9976 & 0.9976 & \text{3.3162}\cdot 10^{-11} \\ 11 & 0.9988 & 0.9988 & 0.9988 & \text{2.0726}\cdot 10^{-12} \\ 12 & 0.9994 & 0.9994 & 0.9994 & \text{1.2954}\cdot 10^{-13} \\ 13 & 0.9997 & 0.9997 & 0.9997 & \text{8.0963}\cdot 10^{-15} \\ 14 & 0.99985 & 0.99985 & 0.99985 & \text{5.0603}\cdot 10^{-16} \\ \end{array} \right] $$

Attached a script in octave

function [X0,k] = GaussNewton(x0,y0,z0,n,error)
  format long   
  %n     = 20;
  %error = 0.0001;
  %x0    = 0;
  %y0    = 0;
  %z0    = 0;

  X0 = [x0;y0;z0];

  for k = 1:n
    x0 = X0(1);
    y0 = X0(2);
    z0 = X0(3);
    Jf = g(x0,y0,z0);
    JR = (f(x0,y0,z0)'*Jf)';
    Hf = Jf'*Jf;
    p  =-inv(Hf)*JR;
    if max(abs(p)) < error
      break;
    end
    X0 = X0 + p;
  end

end

function fx = f(x,y,z)
  fx = [2.*x-y*z-1.;1.-x+y-exp(x-z);-x-2.*y+3.*z];
end

function dfx = g(x,y,z)
  dfx = [2.,-z,-y;-exp(x-z)-1.,1.,exp(x-z);-1.,-2.,3.];
end