Maximum Likelihood – Why Minimize Negative Likelihood for Likelihood Maximization?

likelihoodmaximum likelihood

This question has puzzled me for a long time. I understand the use of 'log' in maximizing the likelihood so I am not asking about 'log'.

My question is, since maximizing log likelihood is equivalent to minimizing "negative log likelihood" (NLL), why did we invent this NLL? Why don't we use the "positive likelihood" all the time? In what circumstances is NLL favored?

I found a little explanation here. https://quantivity.wordpress.com/2011/05/23/why-minimize-negative-log-likelihood/, and it seems to explain the obvious equivalence in depth, but does not solve my confusion.

Any explanation will be appreciated.

Best Answer

This is an alternative answer: optimizers in statistical packages usually work by minimizing the result of a function. If your function gives the likelihood value first it's more convenient to use logarithm in order to decrease the value returned by likelihood function. Then, since the log likelihood and likelihood function have the same increasing or decreasing trend, you can minimize the negative log likelihood in order to actually perform the maximum likelihood estimate of the function you are testing. See for example the nlminb function in R here