Finding the relationship between Gradient and Hessian to improve convergence rate

nonlinear optimizationnumerical optimizationoptimizationordinary differential equations

I ask for help from optimization experts:

Article Finite-Time Convergence in Continuous-Time Optimization – http://proceedings.mlr.press/v119/romero20b/romero20b.pdf contains section 4. Second-Order Convergent Flows, in which various options are proposed on how to relate the Gradient ($\nabla f(x)$) to the Hessian ($\nabla^2 f(x)$), and thereby affect the convergence of the algorithm (formulas $(27 – 30)$).

What I'm most interested in is the relationship that significantly improves the convergence rate. I experimented with the following relationship:

$\dot{x}=\frac{\nabla f(x)}{\sqrt{(\nabla f(x))^2+(\nabla^2 f(x))^2}}$

and found that there is an increase in speed only at those times when $(\nabla^2 f(x))^2<\approx 0.1$

For this, I introduced an additional scaling factor $\frac{1}{k}$ (where $k>>1$):

$\dot{x}=\frac{\nabla f(x)}{\sqrt{(\nabla f(x))^2+\frac{1}{k}(\nabla^2 f(x))^2}}$

But I don't like the need to manually adjust the ratio. What other (more simple and not including inverse Hessian) ways are there to improve the convergence rate by linking the gradient and the Hessian?

I would be glad for advice and help.

Best Answer

Firstly, either I misunderstand you or the term $(\nabla^2f(x))^2$ makes no sense in the context. It is a matrix quantity, not scalar.

Similarly with $(\nabla f(x))^2$. The square of a vector is not defined.

Maybe you mean $\|\nabla^2 f(x)\|^2$ and $\|\nabla f(x)\|^2$ ?

Secondly, to resolve the $k$ parameter issue, you can use some kind of softmax function.

A typical choice would be $$\frac{\exp(-{a_i}^k/\sigma^k)}{\displaystyle\sum_{\forall j} \exp(-{a_j}^k/\sigma^k)}$$

Where $$a_1 = (\nabla f(x))^2\\a_2 = (\nabla^2f(x))^2$$