[Math] Minimization and gradient descent

calculusgradient descentmachine learningnumerical optimization

I am bit puzzled by using gradient descent method.
My doubt is that gradient descent is an iterative method for finding minima/maxima of a cost landscape. And it uses steepest ascent/descent method. But on the other hand setting my gradient vector equal to zero vector I should have got my optimal point. Then why to use gradient descent??

Best Answer

setting my gradient vector equal to zero vector I should have got my optimal point.

More precisely: setting the gradient vector equal to zero vector and solving this equation you would find a critical point.

One equation between vectors in 2 dimensions is actually a system of 2 equations. Those are in general nonlinear (unless your landscape happens to be a second-degree polynomial, which I doubt). To solve such a thing with algebraic manipulations is unlikely. One would have to use some numerical method for solution, which is probably going to be iterative itself.

In addition, equating the gradient to zero could give you a maximum or a saddle point. The gradient descent is more likely to approach a minimum, though it may be fooled by a saddle point sometimes. One still has the issue of figuring out whether the found minimum is global... but this is an issue for every method.