Solved – When Maximising a Log Likelihood Function, Why Is It Set Equal to $0$

maximum likelihood

In proofs of maximising log likelihood functions, the partial derivative of the log likelihood is taken with respect to the value we want to maximise the likelihood of estimating, and then this partial derivative result is set equal to 0 and solved for the value of interest.

At this point, why is the partial derivative set equal to $0$? Coming from a mathematics background, this is not how you find the maximum — it is how you find the critical values.

I would greatly appreciate it if people could please clarify this.

Example given below for the maximum likelihood estimator of $\sigma^2$ for a simple linear regression:

enter image description here

Best Answer

To put @MatthewDrury's answer less correct but perhaps simpler;

As in high-school math, you find the maximum (or the minimum) of a function by setting it's derivative equal to zero. Here, it's multivariate, there are more parameters involved, but the idea is exactly the same. Find the combination of parameters for which the function has derivative zero, that's where the maximum or minimum might be found.

You can look at the examples at https://www.mathsisfun.com/calculus/maxima-minima.html.

It's not always that easy, this is only part of the work if the function has multiple maxima or minima for example. But in this case, there is exactly one combination of parameters that maximimizes the likelihood. In linear regression, that is the same parameter combination as the least squares estimate.