[Math] Gradient-descent algorithm always converges to the closest local optima

convex optimizationconvex-analysisoptimization

Assume $f(\vec x)$, which is Lipschitz continuous, has two local optima $\vec x_1^*$ and $\vec x_2^*$( $\vec x_1^*$ is the global minimum). We start the gradient-descent algorithm from $\vec x_0$ and $\left \|\vec x_0 – \vec x_1^* \right \| \leq \left \|\vec x_0 – \vec x_2^* \right \|$. Additionally, we know between $\vec x_0$ and $\vec x_1^*$, $f(\vec x)$ is convex, but between $x_1^*$ and $x_2^*$, $f(\vec x)$ is not convex.

Can we say initialized at $\vec x_0$ using gradient-descent algorithm and it always converge to $\vec x_1^*$ rather than $\vec x_2^*$? Is there any THM to support this? Is there any other way to do this?

Best Answer

The distance between pairs of points doesn't have anything to do locally with the directional derivative of the function as it varies in between trajectories from $x_0$ to the two points $x_1,x_2$. You could have $||x_0 - x_1|| < ||x_0 - x_2||$ and yet the local gradient at $x_0$ says to go toward point $x_2$ instead of $x_1$. And convexity/lack of convexity in the middle range between $x_0$ and $x_1,x_2$ has nothing to do with it either, because gradient descent is a local method that just relies on gradients and not second derivatives. You could very well converge to $x_2$ with gradient descent. Maybe not with more sophisticated methods.

Related Question