Solved – Determine best ARIMA model with AICc and RMSE

aicarimaforecastingrmstime series

I have done a training set to fit different ARIMA models and then a test set to assess their performance (with R). From what I understood, I can use the AICc to determine the best model by choosing the one with the smallest AICc, but the differencing order of the models has to be the same to be able to compare them. However I can also use the RMSE to choose the best model and different differencing orders don't matter. However, in all my models d=1.

If small values of AICc tend to give better models and if the smaller the RMSE is the better the model is, then models with the smallest AICc should have the smallest RMSE? In my case, models with smaller AICc have greater values of RMSE than models with greater AICc. How should I decide which is the best model then?

Here I show the different ARIMA models with the respective AICc, p-value of the residuals of the Ljung-Box test, the RMSE and the MAPE.

                         AICc        p-value        RMSE         MAPE
 ARIMA (2,1,2)    ~    515.28    ~   0.07054   ~   1.1537   ~   13.812
 ARIMA (2,1,1)    ~    517.91    ~   0.1145    ~   1.0441   ~   13.925
 ARIMA (1,1,2)    ~    517.9     ~   0.1169    ~   1.0667   ~   14.217
 ARIMA (1,1,1)    ~    516.22    ~   0.1732    ~   1.1122   ~   14.848
 ARIMA (2,1,0)    ~    537.3     ~   0.0074    ~   0.9066   ~   12.083
 ARIMA (0,1,2)    ~    519.59    ~   0.1004    ~   0.9431   ~   12.676
 ARIMA (0,1,1)    ~    537.5     ~   0.0007    ~   0.9030   ~   12.006
 ARIMA (1,1,0)    ~    544.32    ~   0.0006    ~   0.8961   ~   11.735
 ARIMA (0,1,0)    ~    549.08    ~   0.0006    ~   0.8963   ~   11.747
 ARIMA (3,1,2)    ~    521.84    ~   0.0368    ~   1.0181   ~   13.527
 ARIMA (2,1,3)    ~    521.6     ~   0.0432    ~   1.0275   ~   13.632
 ARIMA (3,1,3)    ~    511.6     ~   0.1617    ~   1.0945   ~   14.699
 ARIMA (3,1,1)    ~    519.91    ~   0.0800    ~   1.1116   ~   14.815
 ARIMA (1,1,3)    ~    519.78    ~   0.05345   ~   0.9913   ~   13.191

I have to say that auto.arima() with stepwise=FALSE, approximation=FALSE and seasonal=FALSE has chosen ARIMA(2,1,2) but it produces NaNs.

Should I first start by rejecting those models which p-value < 0.05? And then how should I decide the best model? Any suggestions of which model would you choose with these given values?

Best Answer

The AIC should be calculated from residuals using models that control for intervention administration, otherwise the intervention effects are taken to be Gaussian noise, underestimating the actual model's autoregressive effect and thus miscalculates the model parameters which leads directly to an incorrect error sum of squares and ultimately an incorrect AIC. Most SE responders do not point out this assumption when they promote simple descriptive statistics such as AIC and RMSE.

The quick answer is you should use neither unless you are addressing the question of identifying and remedying the effects of unspecified deterministic/exogenous structure

See @AdamO's insightful response to this question Interrupted Time Series Analysis - ARIMAX for High Frequency Biological Data?

"The correlogram should be calculated from residuals using a model that controls for intervention administration, otherwise the intervention effects are taken to be Gaussian noise, underestimating the actual autoregressive effect."