I have done a training set to fit different ARIMA models and then a test set to assess their performance (with R). From what I understood, I can use the AICc to determine the best model by choosing the one with the smallest AICc, but the differencing order of the models has to be the same to be able to compare them. However I can also use the RMSE to choose the best model and different differencing orders don't matter. However, in all my models d=1.
If small values of AICc tend to give better models and if the smaller the RMSE is the better the model is, then models with the smallest AICc should have the smallest RMSE? In my case, models with smaller AICc have greater values of RMSE than models with greater AICc. How should I decide which is the best model then?
Here I show the different ARIMA models with the respective AICc, p-value of the residuals of the Ljung-Box test, the RMSE and the MAPE.
AICc p-value RMSE MAPE ARIMA (2,1,2) ~ 515.28 ~ 0.07054 ~ 1.1537 ~ 13.812 ARIMA (2,1,1) ~ 517.91 ~ 0.1145 ~ 1.0441 ~ 13.925 ARIMA (1,1,2) ~ 517.9 ~ 0.1169 ~ 1.0667 ~ 14.217 ARIMA (1,1,1) ~ 516.22 ~ 0.1732 ~ 1.1122 ~ 14.848 ARIMA (2,1,0) ~ 537.3 ~ 0.0074 ~ 0.9066 ~ 12.083 ARIMA (0,1,2) ~ 519.59 ~ 0.1004 ~ 0.9431 ~ 12.676 ARIMA (0,1,1) ~ 537.5 ~ 0.0007 ~ 0.9030 ~ 12.006 ARIMA (1,1,0) ~ 544.32 ~ 0.0006 ~ 0.8961 ~ 11.735 ARIMA (0,1,0) ~ 549.08 ~ 0.0006 ~ 0.8963 ~ 11.747 ARIMA (3,1,2) ~ 521.84 ~ 0.0368 ~ 1.0181 ~ 13.527 ARIMA (2,1,3) ~ 521.6 ~ 0.0432 ~ 1.0275 ~ 13.632 ARIMA (3,1,3) ~ 511.6 ~ 0.1617 ~ 1.0945 ~ 14.699 ARIMA (3,1,1) ~ 519.91 ~ 0.0800 ~ 1.1116 ~ 14.815 ARIMA (1,1,3) ~ 519.78 ~ 0.05345 ~ 0.9913 ~ 13.191
I have to say that auto.arima() with stepwise=FALSE, approximation=FALSE and seasonal=FALSE has chosen ARIMA(2,1,2) but it produces NaNs.
Should I first start by rejecting those models which p-value < 0.05? And then how should I decide the best model? Any suggestions of which model would you choose with these given values?
The AIC should be calculated from residuals using models that control for intervention administration, otherwise the intervention effects are taken to be Gaussian noise, underestimating the actual model's autoregressive effect and thus miscalculates the model parameters which leads directly to an incorrect error sum of squares and ultimately an incorrect AIC. Most SE responders do not point out this assumption when they promote simple descriptive statistics such as AIC and RMSE.
The quick answer is you should use neither unless you are addressing the question of identifying and remedying the effects of unspecified deterministic/exogenous structure
See @AdamO's insightful response to this question Interrupted Time Series Analysis - ARIMAX for High Frequency Biological Data?
"The correlogram should be calculated from residuals using a model that controls for intervention administration, otherwise the intervention effects are taken to be Gaussian noise, underestimating the actual autoregressive effect."