Solved – Does Akaike’s Information Criterion correspond to a type of regularization

aicerrorregularizationself-study

According to this source, Akaike's Information Criterion can. be defined as:

$AIC = N \log(\frac{SSE}{N}) +2(k+2)$

with SSE the sum of the squared errors, and k the number of parameters in the model.

The first part looks like a loss function, so it seems that minimizing AIC means minimizing the loss as well as the complexity of the model (assuming smaller k implies less complexity).

Am I correct in interpreting the second term $2(k+2)$ as leading to a type of regularization similar to lasso or ridge regression? And is this reason why it is used instead of the MSE or the MAPE in forecasting models?

If this is indeed the case, why can't the second term just be $k$, instead of $2(k+2)$.

Best Answer

It can indeed be interpreted as a type of regularization, but it is not similar to lasso or ridge regression. The latter two regularize by penalizing nonzero values of the parameter estimates as part of the estimation procedure, whereas AIC, in effect, penalizes the number of parameters included in the model - which is typically specified by the modeler, not internally to the estimation procedure (with exceptions for certain automated model selection procedures such as stepwise regression.)

AIC is intended to facilitate comparisons between models, and the coefficient of 2 multiplying the $k$ is derived from asymptotic considerations. These slides provide a very accessible introduction to how the AIC is derived. I agree the additive "4" is not useful and can be removed. However, given the derivation, it's clear that the factor of "2" should remain - it makes the penalty for adding parameters to the model twice as large as otherwise, so it really isn't the same as a penalty of just $k$, and does have some asymptotic optimality properties.