Solved – Selecting between two ARIMA models

arimaforecastingmodel selection

I have a monhtly data set taken from datamarket. I have applied two different ARIMA models with different periods in R. The estimation results are reported below.

Model 1:

ARIMA(3,1,1)(0,1,1)[35]                    

Coefficients:
         ar1     ar2     ar3      ma1     sma1
      0.5363  0.0365  0.0545  -0.9199  -0.8472
s.e.  0.0903  0.0787  0.0754   0.0614   0.1530

sigma^2 estimated as 874694:  log likelihood=-1959.58
AIC=3931.17   AICc=3931.53   BIC=3951.92
                   ME     RMSE      MAE       MPE     MAPE      MASE        ACF1
Training set 6.818351 861.6035 445.7387 -3.906734 13.14426 0.5771561 0.004052349

Model 2:

ARIMA(3,1,1)(1,1,1)[23]                    

Coefficients:
         ar1     ar2     ar3     ma1    sar1     sma1
      0.5161  0.1210  0.0326  -0.937  0.0515  -0.9359
s.e.  0.0832  0.0757  0.0741   0.057  0.0956   0.2221

sigma^2 estimated as 820158:  log likelihood=-2049.93
AIC=4113.85   AICc=4114.32   BIC=4138.42
                   ME     RMSE      MAE       MPE     MAPE     MASE         ACF1
Training set 12.01683 854.0288 456.0118 -3.864165 13.66146 0.590458 0.0005883881

With these results, I am having trouble to choose one of them. One of them has better RMSE but the other one has better MAE and MAPE.
How should I interpret these results and which one should be chosen for better forecasts?

Best Answer

The measures ME, RMSE, MAE, MPE, MAPE, and MASE reported in the model output are in-sample measures. They are not robust to overfitting as you can improve them simply by fitting a richer model. Therefore, they should not be central in guiding the model choice.

Meanwhile, AIC, AICc and BIC are robust to overfitting, as long as you are not comparing too many models at once (see Hansen "A winner’s curse for econometric models: on the joint distribution of in-sample fit and out-of-sample fit and its implications for model selection (2010)).
AIC and AICc target one-step-ahead preditions. (AICc offers improvement over AIC in small samples, so you could just ignore AIC and stick to AICc.) If you want to select the model that should be better at forecasting (which seems to be your goal), look for the one with the lowest AIC and AICc values.
Meanwhile, BIC may select the true model if it is among the candidate models. The true model need not be the one that predicts best (paradoxical as it may sound) but sometimes you are just interested in how the data was generated. Then look for a model with the lowest BIC value.

However, for AIC, AICc and BIC to be directly comparable across models, you need the dependent variable to be exactly the same across the models. I suspect here it is not the case. The two models reported above include seasonal differencing; the seasonal periods differ (23 and 35). This way the model for the differenced data is fit on a longer time series in case of 23 than in case of 35.
What you could to to circumvent this is cut the first 12 observations for the model with period 23. Then the AIC, AICc and BIC should be comparable.

Related Question