Solved – AIC, model selection and variable scale

aicfeature selection

In looking at the formula for the AIC=-2*(LL)-2k and the formula for log likelihood, LL=-n/2*log(2*pi) – n/2*log(sse/n) – n/2, I notice that the term with sse is sensitive to the scale of the dependent variable but the second term is not.

This seems to mean there might be a case where changing measuring units of the dependent variable from, say, kilograms to grams would cause the term (sse/n) to increase while the other terms would remain constant. Depending on the values of the other terms, this change of measuring units could potentially change which model has the lowest AIC.

I have two questions: First, is my reasoning mistaken? Second, assuming I have not made a mistake, how should someone use the AIC to select a model when the result is sensitive to choice of measuring units?

Best Answer

Relative AIC is independent of the scale used for the data. Say, $c$ is the scaling factor. The sse term would become $$ \frac{n}{2} \log\left(\frac{c\times \textrm{sse}}{n}\right) = \frac{n}{2} \log\left(c\right) + \frac{n}{2} \log\left(\frac{\textrm{sse}}{n}\right). $$ The first term on the right is clearly a constant. Hence, it doesn't matter as long as you are comparing models trained on the same data-set with the same number of data points.