Which approaches exist for optimization in machine learning

gradient descentmachine learningoptimizationregression

From this blog post:

For any Optimization problem with respect to Machine Learning, there can be either a numerical approach or an analytical approach. The numerical problems are Deterministic, meaning that they have a closed form solution which doesn’t change. Hence it is also called time invariant problems. These closed form solutions are solvable analytically. But these are not optimization problems.

My interpretation of this text is that the author considers that there are two types of approaches for optimization problems under machine learning:

  • Numerical approach, for which we can compute solutions directly because they have a closed form solution (e.g. linear regression with least squares)
  • Optimization(?)/analytical approach, wherein we try to approximate a good solution (e.g. gradient descent)

This doesn't, however, seem correct. We analytically differentiate the sum of squares expression to obtain its closed form, right? The last sentence of the accepted answer in this post also seems to imply that gradient descent is a numerical method.

Question: Is the categorization above correct? If not, what would be the correct taxonomy for problems in machine learning?

Thank you in advance!

Best Answer

Towards Data Science isn't a reliable website, and the text you've quoted is, unfortunately, nonsense.

For any Optimization problem with respect to Machine Learning, there can be either a numerical approach or an analytical approach. The numerical problems are Deterministic, meaning that they have a closed form solution which doesn’t change. [...] These closed form solutions are solvable analytically. But these are not optimization problems.

What they meant to say, I hope, is that "analytical problems are Determinstic [...]", etc.

I won't explain the difference between analytic and numeric approaches here, because there are lots of good sources, but going by this paragraph I'm going to say the post you read isn't one of them.


EDIT: OK, I'll explain a bit

Part of the problem is that there are a lot of partially overlapping terms. Very roughly speaking, you have:

  • Models where you can directly calculate the parameters: AKA closed-form solutions, analytical or analytic solutions, or sometimes algebraic solutions.
  • Models where you have to use an iterative algorithm to fit the parameters. All such models are numerical, but
    • They might be deterministic (no randomness), like batch gradient descent with fixed starting points, or stochastic (random), like stochastic gradient descent.
    • They might always reach the best value (convex optimisation), or might have a risk of getting stuck at local optima (non-convex optimisation)

There are plenty of other ways to slice this up, but these should be plenty to get started!