Solved – How is the Akaike information criterion (AIC) affected by sample size

aicdistributionslogisticregressionsample

I am evaluating several logistic regression models predicting college student retention. I am using some basic and well-established predictors, such as high school GPA and SAT scores. I understand that when evaluating competing models, a lower AIC is generally preferable.

I had expected that usage of several years of data might improve my models, but I have noticed that the more years of data records that I add to the model, the more the AIC increases. For instance, here are the AICs for several models built using more or less years of student records (going back in time, relative to the most recent academic year):

  • Using 9 years of data: AIC = 4314
  • Using 5 years of data: AIC = 2789
  • Using 4 years of data: AIC = 2312
  • Using 3 years of data: AIC = 1776
  • Using 2 years of data: AIC = 1139
  • Using 1 year of data: AIC = 512

Do these results mean that the AIC becomes inflated or less meaningful with larger sample sizes, or does it imply that each year of records in my dataset are so dramatically different from one another that I probably need to fit a different model for each separate year, or something else?

Best Answer

There is no particular meaning to AIC for comparison between different data sets. Yes, the AIC value can change for increased $n$. However, AIC is self-referential, which means that one can only compare different models using the SAME data set, not different data sets. This is also tricky, for example, it applies to probably detecting better nested models (models that are in a set/subset format, that is, when all of the models tested can be obtained by eliminating parameters from the most inclusive model).

Some experts suggest that AIC also applies to probably detecting better non-nested models, but there are counterexamples, see this Q/A. Perhaps a more meaningful question, that the OP question above is only indirectly implying, is "How well AIC can discriminate between two models when the sample is larger?" and the answer to that is apparently better for increasing $n$. This latter is not unexpected in the sense that AIC is only asymptotically correct, e.g., from Wikipedia, "We ... choose the candidate model that minimized the information loss. We cannot choose with certainty (Sic, italics are mine), because we do not know f (Sic, the unknown data generating process). Akaike (1974) showed, however, that we can estimate, via AIC, how much more (or less) information is lost by g1 than by g2. The estimate, though, is only valid asymptotically; if the number of data points is small, then some correction is often necessary (see AICc...)."

Now arbitrary examples of how AIC changes. The first change we consider examines how AIC varies using the same random standard normal variate but different seeds. Shown is a histogram of 1000 repetitions of (normal distribution) model AIC values each from 100 random standard normal outcomes.

enter image description here

This shows a distribution for which normalcy is not excluded with $\mu \to -497.672,\sigma \to 48.5034$. This illustrates that a mean AIC value for 1000 independent repetitions of $n=100$ is an educated guess for location of AIC. Next, we apply this "educated guess" and fit it to show the trend:

enter image description here

This plot shows how mean AIC values (from 1000 independent trials) change when the number of random outcomes in each trial is $n=5,10,15,...,95,100$. This appears to be approximately a cubic with an SE of 1 AIC unit (R$^2=0.999964$). The meaning of this is like the sound of one hand clapping; all we have done is find a result that is consistent with AIC being a better discriminator for increasing $n$; without comparing to a second model for each trial we cannot detect anything. The only question remaining is why the AIC values increase for more data in the OP's question. Some software packages will sometimes show $-$AIC values in tables so that more is better, as opposed to less is better, but use the AIC values themselves for discriminating between models.

Related Question