Solved – AIC, likelihood, loglikelihood confusion

aiclikelihoodr

AIC values (from a fitted model, for example) are positive. So are the likelihood values. Are the log-likelihood values positive or negative? Here, in Wikipedia page concerning likelihood ratio test the log-likelihood values are negative and the less negative value indicates better fit.

But in this page, there is -(log-likelihood) (meaning negative of the log-likelihood) and it says that more negative value indicates better fit. In R the logLik-function from a model gives negative values, but if I have only the likelihood and I want a log-likelihood from it, I have to take negative of the logarithm???

So my question is fairly simple, when comparing those values (AIC, likelihood, logarithmic likelihood) from 2 different models, which (of which kind, more or less negative) value indicates better model?

Here is also some statistic as an example, could some one "translate" this info as words for me =) (I am using msts and tbats from forecast-function)

> fit1$likelihood
   [1] 90871.47
> fit2$likelihood
[1] 90785.92
> fit1$AIC
[1] 90909.47
> fit2$AIC
[1] 90839.92
# AIC from likelihood, par1 refers to number of fitted parameters
> 2*par1-2*log(fit1$likelihood) 
[1] -14.8344
> 2*par2-2*log(fit2$likelihood) # AIC from likelihood
[1] -10.83252

So why I wont get the same AIC values when calculating "by hand"? Which one are the correct ones?

Best Answer

AIC will be negative whenever $k<\ln(L)$ (see this question and answers Negative values for AIC in General Mixed Model). Also keep in mind the log-likelihood can be positive, since the likelihood function is usually a probability density, i.e. it's always positive and can exceed $1$.

The interpretation, when comparing models, is the same though. The smallest AIC provides the "best" model, in the sense that it tends towards models with small $k$ and large $\ln{(L)}$.

Regarding likelihood, the higher the "better". Also, as log-likelihood is a monotonic transformation of likelihood, the same applies. This makes sense, since we want to minimize $\text {AIC}$:

$$\text {AIC} = 2k -2\ln(L)$$


Regarding your example in R, I'm pretty sure fit1$likelihood and fit2$likelihood are actually two times the negative log-likelihoods. The reason is simple:

fit1$likelihood - fit1$AIC
#[1] -38
fit2$likelihood - fit2$AIC
#[1] -54

If the AIC calculation is right, this means fit1 has 19 parameters and fit2 has 27 parameters.