Solved – gradient descent and local maximum

gradientgradient descenthessianmachine learningoptimization

I read that gradient descent converge always to a local minimum while other methods as Newton's method this is not guaranteed (if the Hessian is not definite positive); but if the start point in GD is unfortunately a local maximum (and then the derivative is zero), how we can say that it converge to a minimum? does not it stall?

Best Answer

If Gradient Descent gets initialized in such a way that it starts at a local maximum (or a saddle point, or a local minimum) with gradient zero, then it will simply stay stuck there. Variations of GD, such as Stochastic GD and Mini-batch GD try to work around this by adding an element of randomness to the search, but even those aren't guaranteed to escape a zero gradient region if the shape of the gradient is weird enough.

In practice the only way to solve this is to reinitialize your search with new weights or parameters that start in a completely new region of the search space. This won't be hard to do, since if you do get stuck in such a zero-gradient area you will notice very quickly that the error in your training isn't changing at all, and you would know that you need to start over.

Related Question