Solved – linear regression with gaussian distribution

machine learningmaximum likelihoodnormal distributionregressionself-study

I am reading the book Machine Learning – A Probabilistic Perspective by Kevin P. Murphy.

In the chapter about linear regression he introduces a method where you estimate the parameters for the Gaussian distribution via maximum likelihood estimation:

Instead of maximizing the maximum likelihood he minimizes the negative log-likelihood:

This formula can be transformed to:

Which I can still follow.

However, afterwards he writes:

First, we rewrite the objective in a form that is more amenable to differentiation:

followed by this equation:

I don't understand how this equation equals NLL(theta) (from above). Were are all the sigma, pi etc.?

I understand that he derives the function in order to retrieve optimal parameters but I don't understand the transformation.

I hope this is enough to answer the question, I don't think that I am allowed to post the whole chapter here so I hope that someone has read the book.

Thanks in advance!

Best Answer

This looks like the author is simplifying the original function to a form that will have the same minima, but is easier to differentiate. In other words, the minima of NLL(w) will also be minima of the original formula.

The reasons why these simplifications work is that 'log' and 'exp' are monotonic and continuous on [0, 1], and multiplying a function by a constant doesn't change the minima.

Best Answer

Related Solutions

Solved – Trying to understand Gaussian Process

Linear Algebra – Understanding Linear Algebra in OLS Derivation

Related Question