Solved – Effects of step size in gradient descent optimisation

algorithmsoptimization

Im using gradient-descent-based algorithm for my problem where
new_value = old_value – Step_size*Gradient

For exit criteria, im determining the change in fn value between iteration i.e.,
if (old_Objective_fn_value – new_Objective_fn_value) <=0.001 exist otherwise continue.

For different Step_size, the algorithm meets the exit criteria at different point. For example, when my Step_size is x the final objective function value is p and when my Step_size is y the final objective function value is q.

I would like to know any logical reason why the algorithm converges at different objective fun values rather than at the same.

How can we make the algorithm converge to the same objective function value irrespective of the step size with the same exit criterion?

Best Answer

You encountered a known problem with gradient descent methods: Large step sizes can cause you to overstep local minima. Your objective function has multiple local minima, and a large step carried you right through one valley and into the next. This is a general problem of gradient descent methods and cannot be fixed. Usually, this is why the method is combined with the second-order Newton method into the Levenberg-Marquardt.

Related Question