Solved – AIC criterion: definition

aicregression

I have two questions regarding the AIC criterion : AIC=$2k-2ln(l)$

Where does the number 2 comes from? As we usually minimize it why don't we consider only : $k-ln(l)$. (Maybe I am missing something.)

Second question : Can we consider that AIC selection follow the same "philosophy" as lasso/ridge selection ?

Indeed, if we consider that $0^{0}=0$ we would then have:

AIC selection: max[$ ln(l)-\lambda\sum|\beta|^{0}$] (with $\lambda=1$)

Lasso selection: max[$ ln(l)-\lambda\sum|\beta|^{1}$]

Ridge selection: max[$ ln(l)-\lambda\sum|\beta|^{2}$]

Best Answer

Of course, you get the same answer without the factor of 2. Burnham & Anderson refer to Akaike's multiplication by -2 as done for "historical reasons." I believe what they mean is the following. Historically, AIC was developed in the context of linear regression, which assumes errors are iid mean 0. Oneof the classic ways to fit such models was chi-square fitting. Twice the NLL happens to exactly equal the chi-square value (see https://en.wikipedia.org/wiki/Akaike_information_criterion#Chi-squared_fits). I believe that is why Akaike multiplied the loglikelihood by -2, so as to make it equivalent. Also, see section 4 of this paper.