Machine Learning – Why Minimize Cost Instead of Maximize Reward in Optimization?

expectation-maximizationloss-functionsmachine learningoptimization

I understand that, for example, maximizing the log-likelihood is equivalent to minimizing the negative log-likelihood. It is indeed a simple change, but still an extra step taken (it seems) for the unique purpose of designing a loss function that will be minimized instead of maximized.

I wonder why this has become the standard in Machine Learning?

  • Is there any numerical consideration that favors function minimization instead of maximization?
  • Why has gradient descent become such a universal standard? (I have never seen a Deep Learning paper in which they use gradient ascent to directly maximize the likelihood)

Disclaimer :

I came across many similar questions, but none of which that have been truly answered. People typically just explain how both approaches are equivalent, or explain why we use the logarithm for numerical stability, but without explaining why minimization is favored over maximization. (See those two questions : 1, 2)

Best Answer

It's my understanding that the only reason for this distinction is that in numerical analysis, it's the standard to talk about convex optimization rather than concave optimization, even though they are really the same procedures. For example, if you do a google scholar search for "concave optimization", you get about 300,000 hits, but "convex optimization" gets about 2,000,000.

Because convex optimization is talked about more in the numerical analysis literature, this nomenclature is followed in the machine learning community.

As you state, the differences are trivial, so the reason for the distinction is trivial.