Solved – AIC versus cross validation in time series: the small sample case

aiccross-validationforecastingmodel selectiontime series

I am interested in model selection in a time series setting. For concreteness, suppose I want to select an ARMA model from a pool of ARMA models with different lag orders. The ultimate intent is forecasting.

Model selection can be done by

  1. cross validation,
  2. use of information criteria (AIC, BIC),

among other methods.

Rob J. Hyndman provides a way to do cross validation for time series. For relatively small samples, the sample size used in cross validation may be qualitatively different than the original sample size. For example, if the original sample size is 200 observations, then one could think of starting cross validation by taking the first 101 observations and expanding the window to 102, 103, …, 200 observations to obtain 100 cross-validation results. Clearly, a model that is reasonably parsimonious for 200 observation may be too large for 100 observations and thus its validation error will be large. Thus cross validation is likely to systematically favour too-parsimonious models. This is an undesirable effect due to the mismatch in sample sizes.

An alternative to cross validation is using information criteria for model selection. Since I care about forecasting, I would use AIC. Even though AIC is asymptotically equiv­a­lent to min­i­miz­ing the out-​​of-​​sample one-​​step fore­cast MSE for time series mod­els (according to this post by Rob J. Hyndman), I doubt this is relevant here since the sample sizes I care about are not that large…

Question: should I choose AIC over time series cross validation for small/medium samples?

A few related questions can be found here, here and here.

Best Answer

Taking theoretical considerations aside, Akaike Information Criterion is just likelihood penalized by the degrees of freedom. What follows, AIC accounts for uncertainty in the data (-2LL) and makes the assumption that more parameters leads to higher risk of overfitting (2k). Cross-validation just looks at the test set performance of the model, with no further assumptions.

If you care mostly about making the predictions and you can assume that the test set(s) would be reasonably similar to the real-world data, you should go for cross-validation. The possible problem is that when your data is small, then by splitting it, you end up with small training and test sets. Less data for training is bad, and less data for test set makes the cross-validation results more uncertain (see Varoquaux, 2018). If your test sample is insufficient, you may be forced to use AIC, but keeping in mind what it measures, and what can assumptions it makes.

On another hand, as already mentioned in comments, AIC gives you asymptomatic guarantees, and it's not the case with small samples. Small samples may be misleading about the uncertainty in the data as well.