Solved – Why study convex optimization for theoretical machine learning

convexmachine learningoptimizationtransfer learning

I am working on theoretical machine learning — on transfer learning, to be specific — for my Ph.D.

Out of curiosity, why should I take a course on convex optimization?
What take-aways from convex optimization can I use in my research on theoretical machine learning?

Best Answer

Machine learning algorithms use optimization all the time. We minimize loss, or error, or maximize some kind of score functions. Gradient descent is the "hello world" optimization algorithm covered on probably any machine learning course. It is obvious in the case of regression, or classification models, but even with tasks such as clustering we are looking for a solution that optimally fits our data (e.g. k-means minimizes the within-cluster sum of squares). So if you want to understand how the machine learning algorithms do work, learning more about optimization helps. Moreover, if you need to do things like hyperparameter tuning, then you are also directly using optimization.

One could argue that convex optimization shouldn't be that interesting for machine learning since instead of dealing with convex functions, we often encounter loss surfaces like the one below, that are far from convex.

(source: https://www.cs.umd.edu/~tomg/projects/landscapes/ and arXiv:1712.09913)

Nonetheless, as mentioned in other answers, convex optimization is faster, simpler, and less computationally intensive. For example, gradient descent and alike algorithms are commonly used in machine learning, especially for neural networks, because they "work", scale, and are widely implemented in different software, nonetheless, they are not the best that we can get and have their pitfalls, as discussed by Ali Rahimi's talk at NIPS 2017.

On another hand, non-convex optimization algorithms such as evolutionary algorithms seem to be gaining more and more recognition in the ML community, e.g. training neural networks by neuroevolution seems to be a recent research topic (see also arXiv:1712.07897).

Best Answer

Related Solutions

Solved – Why is Newton’s method not widely used in machine learning

Solved – What concepts in optimization do I need for machine learning

Related Question