Solved – Can gradient descent be applied to non-convex functions

optimization

I'm just learning about optimization, and having trouble understanding the difference between convex and non-convex optimization. From my understanding, a convex function is one where "the line segment between any two points on the graph of the function lies above or on the graph". In this case, a gradient descent algorithm could be used, because there is a single minimum and the gradients will always take you to that minimum.

However, what about the function in this figure:

enter image description here

Here, the blue line segment crosses below the red function. However, the function still has a single minimum, and so gradient descent would still take you to this minimum.

So my questions are:

1) Is the function in this figure convex, or non-convex?

2) If it is non-convex, then can convex optimization methods (gradient descent) still be applied?

Best Answer

The function you have graphed is indeed not convex. However, it is quasiconvex.

Gradient descent is a generic method for continuous optimization, so it can be, and is very commonly, applied to nonconvex functions. With a smooth function and a reasonably selected step size, it will generate a sequence of points $x_1, x_2, \ldots$ with strictly decreasing values $f(x_1) > f(x_2) > \ldots$.

Gradient descent will eventually converge to a stationary point of the function, regardless of convexity. If the function is convex, this will be a global minimum, but if not, it could be a local minimum or even a saddle point.

Quasiconvex functions are an interesting case. Any local minimum of a quasiconvex function is also a global minimum, but quasiconvex functions can also have stationary points that are not local minima (take $f(x) = x^3$ for example). So it's theoretically possible for gradient descent to get stuck on such a stationary point and not progress to a global min. In your example, if the shoulder on the left side of the graph were to perfectly level out, gradient descent could get stuck there. However, variants such as the heavy-ball method might be able to "roll through" and reach the global min.

Related Question