Understanding Gradient Descent Algorithms

convex optimizationgradient descentnumerical optimizationoptimization

Given the gradient descent algorithm:
$$x_{i+1} = x_i – \alpha_iD_i\nabla f(x_i)$$
with $\alpha_i$ suffiently small and $D_i > 0$ (positive definite)

I was under the assumption and the algorithm will always converge to a global minimum if $f$ is convex or to a local minimum. Is this not the case?

Does $f$ have to be convex for a global minimum to be found? In general, is gradient descent really finding a stationary point? with further analysis being required to determine if it's a min/max/inflection?

Best Answer

There are some provisos (in addition to the obvious requirement that $f$ is sufficiently smooth).

  1. You cannot take $\alpha_i$ to be arbitrarily small. An easy counterexample is that if you always choose $\alpha_i=0$, clearly your algorithm just stays put at the initial guess and won't converge to any kind of critical point. You need $\alpha_i$ to both be small enough to guarantee that each step decreases the energy, yet also large enough to guarantee a sufficient decrease each step. (Take a look at some articles about line searches, or about the Wolfe condition in particular, for more information.)
  2. Similarly you cannot take $D_i$ to have arbitrarily large eigenvalues. For instance, consider what happens if you set $\alpha_i=1$ and $D_i = 2^i I$.
  3. It's possible for gradient descent to diverge to infinity. This can happen even if $f$ is convex (consider $f(x)=x$ for the simplest counterexample). In cases where gradient descent diverges, one can argue that there is a "minimum at infinity" but you need to be a bit careful about how to make this precise as formally gradient descent is not converging in these cases.

That said, suppose gradient descent does converge to some point. This point might be a saddle point, or even a local maximum (if $x_0$ is a saddle point, you will never leave it, for example). But it's very unlikely. You will converge to a local minimum for almost all starting points and moreover if you do converge to a saddle point, you can "slide off" by applying a small random perturbation to your solution and solving again.

If $f$ is convex, all critical points of $f$ are global minima and so, if gradient descent converges, it does so to a global minimum. You might also converge to a global minimum if $f$ is not convex. Depends on the initial guess, step size, preconditioner, line search strategy... there are no guarantees.

Related Question