Solved – Maximum Log-Likelihood Estimation interpretation results

maximum likelihoodpython

I have fitted four distributions to a sample using MLE.

The following code (example) was used to calculate the MLE in python:

from scipy.optimize import minimize
from math import exp, log

def distr(d, a):
    result = (d/a**2)*exp(-d/a)
    return result

def log_L(a, diameters):
    result = -sum(log(distr(d, a)) for d in diameters)
    return result

res = minimize(log_L, [1], args=diameters)

Which returns an output such as:

     fun: 737.6689924048228
hess_inv: array([[  5.68951613e-06]])
     jac: array([[ -1.52587891e-05]])
 message: Desired error not necessarily achieved due zo precision loss.
    nfev: 164
     nit: 7
    njev: 51
  status: 2
 success: False
       x: array([ 0.37047972])

As far as I understand it, the value "fun" of my result of the optimize.minimize function returns the actual optimized max. log-likelihood. Subsequently the values i got for "fun" for my 4 distributions:

function1 = 580.05
function2 = 1293.68
function3 = 689.63
function4 = 737.67

I'm pretty confident, that the algorithm for the MLE is correct, since I also calculated the MLE for function 4 analytically and it resulted in the identical fitted parameter.

This may now sound like a stupid question, but do I have to take the smallest or the greatest value as my best fit? I suppose its the smallest value since I minimized my log-likelihood, but I'm not completely sure.

And on the other hand, when I then want to calculate the Akaike Information Criterion (AIC), computed in the following way:

AIC = 2*k – 2* "fun"

where k is the number of parameters and "fun" is the max. log-likelihood calculated above, would I take the greatest value as my best option?

I'd appreciate any answer you could give me very much!

Best Answer

Your question is a little confusing because you interchangeably talk about maximum likelihood estimation, and "minimizing the log-likelihood". The estimate that maximizes the likelihood also maximizes the log-likelihood. However, it so happens that many standard optimization algorithms by default want to minimize the function you give them. So to maximize the log-likelihood with such an algorithm, the solution is to pass it the negative of the log-likelihood. This also seems to be what you're doing in your code. This means that those "fun" values you're getting from your minimizing functions are not log-likelihoods, but negative log-likelihoods. Thus lower values are indeed "better" because they reflect higher likelihoods.

The formula for the AIC is: $$ AIC = 2k - 2\ln(\hat{L}) $$ where $\hat{L}$ is the maximum likelihood for your model. Your "fun" value corresponds to $-\ln(\hat{L})$, not $\ln(\hat{L})$. So the way you're currently computing the AIC is wrong. It should be, in your notation: AIC = 2*k + 2*"fun", i.e. '+' instead of '-' (because "fun" is already the negative log-likelihood). Note that lower AIC values are better.