Solved – AIC can recommend an overfitting model

aicmodel selection

I have a doubt related to using AIC for model selection – in which case it may not recommend the true best predictive model (based on my understanding). I understand AIC has 2 terms – goodness of fit (which can be obtained my finding error on model training dataset) and complexity term (2*no. of parameters in model). I discuss the case below:

I have 2 models – 1st model is non-parametric model that interpolates each data point and overfits, hence no. of parameters (K) is same as sample size (lets say 500). SSE on train set (goodness of fit) is very good, say 1e-4 (as it overfits the trained dataset). Its calculated AIC value (using formula, n*ln(SSE/n)+2K) would be -6712.
The second model is a parametric model (2nd order polynomial regressive model) with 6 parameters. Its goodness of fit is not as good as non-parametric model with SSE being 1e-1. Its calculated AIC would be -4246.

Based on delta(AIC), we would select model 1, but we know that the 1st model overfits the data and hence would not generalize well on new data.

So, how do we use AIC in such cases when a model overfits data but the complexity term does not penalize it well enough to reject it among others. Does this case imply we can not use AIC for differentiating between parametric and non-parametric models?

Best Answer

AIC can most definitely select an overfit model, because you e.g.

  1. only assess overfit models (so one of those gets selected), or
  2. you offer up an overfit model vs. an inappropriate model (seems to be your example)/a very overfit model/an underfit model, or
  3. you compare more than one model via AIC (the more models the worse this gets) and by testing several you end up overfitting via the model selection.

While AIC attempts to balance fit to the training data vs. model complexity, there is nothing inherent to it that would provide a guarantee that model selection would result in an non-overfit model (of course, it penalizes model complexity more than if you selected solely on fit to the training data, so in that sense overfitting ought to be a bit more avoided). In fact (see point 3 above), the very fact of doing model selection involves the potential for overfitting to the data on which you select the model and model averaging (and various other approaches) have been proposed to avoid this issue (see e.g. Model Selection and Multimodel Inference by Burnham and Anderson).

Related Question