Time Series Forecasting – Purpose of Model AIC vs. Forecast RMSE at Each Lag

aicforecastingmodel selectionrmstime series

If the purpose of the time series modelling is to build one that gives the most accurate forecast, may I ask if it necessary to check the model AIC to determine the optimal lag when you can iteratively check the forecast RMSE at all possible lags and determine the lag that gives the lowest RMSE? Also, is it unusual that the model that gives the lowest RMSE may not necessarily has the minimum AIC?

Best Answer

Is it necessary to check the model AIC? No. But it can be a good idea.

  • AIC requires fitting the model only once while measures based on splitting the dataset may require fitting it multiple times (as in time series cross validation using rolling windows).*
  • AIC uses the entire dataset for training and thus achieves the highest possible estimation precision. This is in contrast to measures based on splitting the dataset into training and test subsets that can become particularly problematic in smaller samples; see "AIC versus cross validation in time series: the small sample case".*
  • However, if your evaluation loss function is very different from the loss function implied by the maximum likelihood estimator (based on which AIC is calculated), then it may be more reasonable to use measures based on splitting the dataset into training and test subsets and based on the evaluation loss function of interest; see "Equivalence of AIC and LOOCV under mismatched loss functions" and "Optimality of AIC w.r.t. loss functions used for evaluation".

*Meanwhile, using RMSE without splitting the dataset is just going to select the most flexible of the candidate models, and that will in all likelihood be overfitting.