Gradient direction / descent

multivariable-calculusscalar-fieldsvector analysis

it was a while ago I read multivariable calculus so I need to refresh certain results.

Given $ f:R^n\to R $, at a local stationary point $ x $ the gradient is $ \nabla f(x) = 0 $. However, given the fact that the gradient points at the direction which $f$ increases the most, how come the gradient is zero at a local minima?

Also, about Gradient descent we use that fact to find a local minima, as per saying that if $ \nabla f$ points in the direction with maximum increase, $-\nabla f $ points in the direction of maximum decrease.

How is that equivalent?

Best Answer

You are right in saying that gradient points in direction where $\nabla f$ increases the most and when $f(x)$ is decreasing we have that $\nabla f(x)$ is negative.

A $\nabla f(x) = 0$ at a local minima or a local $\bf{maxima}$ (or an inflection, but we can ignore it for now)!

Why does gradient descent take us to local minima ?

Well because gradient descent is pushing in the direction of $-\nabla f(x)$ !!

$$a_{n+1} = a_n + \lambda (-\nabla f(x))$$

Your each subsequent step $a_{n+1}$ is $\lambda$ sized stride in direction opposite of the steepest increase or in other words in the direction of steepest decrease.

Related Question