Solved – AIC, model selection and overfitting

aicmodel selectionoverfittingreferences

I am looking for references that specifically show that Akaike's Information Criterion (AIC), or its corrected form (AICc), can in some practical applications — that is, not in the asymptotic regime — highly underestimate the penalty for model complexity, favoring overly complex model that would then perform worse on new data; and possibly ways to detect this "failure mode" of AIC (the obvious one I can think of is cross-validation).

More generally, I am also looking for some authoritative reference, besides basic common sense, that advises against "blind model selection" — that is, merely deciding that an hypothesis is true after comparing models with some criterion, without "predictive checks" or other forms of independent validation. Ideally, I am looking for a strong statement (e.g., something like this, perhaps a bit less graphic), with examples for why it is such a bad idea.

Any suggestion, off the top of your head?

(As you would expect, there is a massive number of questions related to AIC and model selection on this website, but I could not find something that specifically addresses my issue.)

PS: To clarify, regarding the first question, I am interested in references that talk about AIC, but it's fine if the paper discusses information criteria in general (e.g., both AIC and BIC), as long as AIC is included.

Best Answer

The following is not exactly what you need as it applies to any information criterion (AIC as well as BIC included), but it is still an interesting criticism of blind selection in model-rich environments.

This is from the Abstract:

The implication is that good in-sample fit translates into poor out-of-sample fit, one-to-one. The result exposes a winners curse problem when multiple models are compared in terms of their in-sample fit. The winners curse has important implications for model selection by standard information criteria such as AIC and BIC.

This is from the Introduction:

Model selection by standard information criteria, such as AIC and BIC, tend to favor models that have a large $\eta$ in the sample used for estimation. We shall refer to this as the winner's curse of model selection. The winner's curse is particularly relevant in model-rich environments where many models may have a similar expected fit when evaluated at their respective population parameters. So we will argue that standard information criteria are poorly suited for the selecting a model with a good out-of-sample fit in model-rich environments.