Solved – linear regression with gaussian distribution

machine learningmaximum likelihoodnormal distributionregressionself-study

I am reading the book Machine Learning – A Probabilistic Perspective by Kevin P. Murphy.

In the chapter about linear regression he introduces a method where you estimate the parameters for the Gaussian distribution via maximum likelihood estimation:

enter image description here

Instead of maximizing the maximum likelihood he minimizes the negative log-likelihood:

enter image description here

This formula can be transformed to:

enter image description here

Which I can still follow.

However, afterwards he writes:

First, we rewrite the objective in a form that is more amenable to differentiation:

followed by this equation:

enter image description here

I don't understand how this equation equals NLL(theta) (from above). Were are all the sigma, pi etc.?

I understand that he derives the function in order to retrieve optimal parameters but I don't understand the transformation.

I hope this is enough to answer the question, I don't think that I am allowed to post the whole chapter here so I hope that someone has read the book.

Thanks in advance!

Best Answer

This looks like the author is simplifying the original function to a form that will have the same minima, but is easier to differentiate. In other words, the minima of NLL(w) will also be minima of the original formula.

The reasons why these simplifications work is that 'log' and 'exp' are monotonic and continuous on [0, 1], and multiplying a function by a constant doesn't change the minima.

Related Question