Solved – What selection criteria to use and why? (AIC, RMSE, MAPE) – All possible model selection for time series forecasting

aicforecastingmodel selectionrms

I'm performing all possible model selection in SAS for time series forecasting and basically fitting 40 models on the data and shortlisting the 'n' best models based on selection criteria.

What criteria should I use to shortlist?

From what I've read, SBC and AIC select the most parsimonious models or the ones with the least parameters (due to penalties applied). But if I'm fitting a time series model, I only have 1 independent variable or x, that is TIME.

Also I read somewhere that RMSE is highly susceptible to outliers.

Best Answer

The short answer is that there is no silver bullet. The few selection criteria you have named are also by far not all there are (as I am sure you are aware of).

So let us start with the ones that are most commonly used for time series applications: the Bayesian-Schwarz Criterion (BIC), the Akaike Criterion (AIC), and the Hannan-Quinn Criterion (HQC). The way these model selection criteria are used is to select the lag length of your model (i.e., how many periods of the past affect the present period).

These three criteria are estimating the Kullback-Leibler divergence of your data and asymptotically select a true model. Notice how I said 'a' true model, because including superfluous lags asymptotically makes no difference (since asymptotically, they will be estimated to be zero). It is noteworthy that AIC asymptotically selects a true model that strictly overfits, i.e. a model that is larger than the smallest true model. In Machine Learning terminology, it is prone to overfitting. BIC and HQC on the other hand select the smallest true model asymptotically. They have the drawback of underselecting in finite samples, which is why AIC is often preferred in applications.

The main problem with (unpenalized) RMSE is that extending the lag length (i.e., including more lags as explanatory variables) will always yield a better value for RMSE. This is so because the fit will not get worse by including more explanatory variables, and RMSE is a direct measure of fit.

I don't know your exact application, but I feel like many practitioners would go with comparing the optimum using AIC, BIC, and HQC and justifying their chosen lag length that way.