AIC – Understanding Negative AIC Values

aic

My question is related to the thread Negative values for AIC in General Mixed Model. I often get negative AIC values from the software I use. I notice it most when I'm doing time series. But here is what I don't get. When defining the AIC like

$$AIC = 2k-2\ln(L)$$

$L$, the likelihood, is a joint probability and to my understanding must be bound between 0 and 1. Mathematically this implies the $AIC$ must be positive. So I don't know what my software is giving me for the value labeled $AIC$. Any thoughts?

Best Answer

$L$ is not a joint probability (joint cumulative probability density) but joint probability density. Since density only needs to be non-negative and is not bounded from above, $\operatorname{ln}(L)$ can be both positive and negative. Hence, $AIC$ can also be both positive and negative.

Related Solutions

Solved – AIC, likelihood, loglikelihood confusion

AIC will be negative whenever $k<\ln(L)$ (see this question and answers Negative values for AIC in General Mixed Model). Also keep in mind the log-likelihood can be positive, since the likelihood function is usually a probability density, i.e. it's always positive and can exceed $1$.

The interpretation, when comparing models, is the same though. The smallest AIC provides the "best" model, in the sense that it tends towards models with small $k$ and large $\ln{(L)}$.

Regarding likelihood, the higher the "better". Also, as log-likelihood is a monotonic transformation of likelihood, the same applies. This makes sense, since we want to minimize $\text {AIC}$:

$$\text {AIC} = 2k -2\ln(L)$$

Regarding your example in R, I'm pretty sure fit1$likelihood and fit2$likelihood are actually two times the negative log-likelihoods. The reason is simple:

fit1$likelihood - fit1$AIC
#[1] -38
fit2$likelihood - fit2$AIC
#[1] -54

If the AIC calculation is right, this means fit1 has 19 parameters and fit2 has 27 parameters.

Solved – GAM (mgcv): AIC vs Deviance Explained

The respective formulas for these two quantities are: $$\text{deviance} = 2\log\mathcal{L}(\text{saturated model}\, |\, \text{data}) - 2\log\mathcal{L}(\text{model}\, |\, \text{data})$$ $$\text{AIC} = 2k- 2\log\mathcal{L}(\text{model}\, |\, \text{data})$$ where $\mathcal{L}$ is the likelihood and $k$ is the number of model parameters. For a fixed dataset and model family, the saturated model is fixed, and therefore for our purposes the equation for deviance is: $$\text{deviance} = \text{constant} - 2\log\mathcal{L}(\text{model}\, |\, \text{data})$$

Plotting AIC against deviance the way that you've done, we expect the data to fall along a straight line if there exist constants $c_1$ and $c_2$ such that: $$c_1 \cdot \text{AIC} + c_2 \approx \text{Deviance}$$

This can only be the case if $k \propto \log\mathcal{L}$. Although this is not a relationship that I have previously come across, it seems plausible.

However it could also be that a different formula for Deviance is being used altogether, as intimated here.

Best Answer

Related Solutions

Solved – AIC, likelihood, loglikelihood confusion

Solved – GAM (mgcv): AIC vs Deviance Explained

Related Question