Solved – Selecting between two ARIMA models

arimaforecastingmodel selection

I have a monhtly data set taken from datamarket. I have applied two different ARIMA models with different periods in R. The estimation results are reported below.

Model 1:

ARIMA(3,1,1)(0,1,1)[35]                    

Coefficients:
         ar1     ar2     ar3      ma1     sma1
      0.5363  0.0365  0.0545  -0.9199  -0.8472
s.e.  0.0903  0.0787  0.0754   0.0614   0.1530

sigma^2 estimated as 874694:  log likelihood=-1959.58
AIC=3931.17   AICc=3931.53   BIC=3951.92
                   ME     RMSE      MAE       MPE     MAPE      MASE        ACF1
Training set 6.818351 861.6035 445.7387 -3.906734 13.14426 0.5771561 0.004052349

Model 2:

ARIMA(3,1,1)(1,1,1)[23]                    

Coefficients:
         ar1     ar2     ar3     ma1    sar1     sma1
      0.5161  0.1210  0.0326  -0.937  0.0515  -0.9359
s.e.  0.0832  0.0757  0.0741   0.057  0.0956   0.2221

sigma^2 estimated as 820158:  log likelihood=-2049.93
AIC=4113.85   AICc=4114.32   BIC=4138.42
                   ME     RMSE      MAE       MPE     MAPE     MASE         ACF1
Training set 12.01683 854.0288 456.0118 -3.864165 13.66146 0.590458 0.0005883881

With these results, I am having trouble to choose one of them. One of them has better RMSE but the other one has better MAE and MAPE.
How should I interpret these results and which one should be chosen for better forecasts?

Best Answer

The measures ME, RMSE, MAE, MPE, MAPE, and MASE reported in the model output are in-sample measures. They are not robust to overfitting as you can improve them simply by fitting a richer model. Therefore, they should not be central in guiding the model choice.

Meanwhile, AIC, AICc and BIC are robust to overfitting, as long as you are not comparing too many models at once (see Hansen "A winner’s curse for econometric models: on the joint distribution of in-sample fit and out-of-sample fit and its implications for model selection (2010)).
AIC and AICc target one-step-ahead preditions. (AICc offers improvement over AIC in small samples, so you could just ignore AIC and stick to AICc.) If you want to select the model that should be better at forecasting (which seems to be your goal), look for the one with the lowest AIC and AICc values.
Meanwhile, BIC may select the true model if it is among the candidate models. The true model need not be the one that predicts best (paradoxical as it may sound) but sometimes you are just interested in how the data was generated. Then look for a model with the lowest BIC value.

However, for AIC, AICc and BIC to be directly comparable across models, you need the dependent variable to be exactly the same across the models. I suspect here it is not the case. The two models reported above include seasonal differencing; the seasonal periods differ (23 and 35). This way the model for the differenced data is fit on a longer time series in case of 23 than in case of 35.
What you could to to circumvent this is cut the first 12 observations for the model with period 23. Then the AIC, AICc and BIC should be comparable.

Related Solutions

Solved – time series forecasting using auto.arima and exponential smoothing

Seasonality is probably not very strong. Different algorithms will give different results, unless seasonality is glaringly obvious.
The best measure is always to compare forecast accuracy on a holdout set: hold back the last $n$ observations, fit your models to all other observations, forecast into the last $n$ time periods with both models, then compare forecast accuracy using your error measure of choice (see 5 below).
Yes, this is a common complaint. I don't think there is an easy way to get the in-sample fit. But you can get the residuals: auto.arima(WWWusage)$residuals. Best to look into the code of auto.arima() to see whether you need to add or subtract them from the original series to get the fit. I'd say you have to subtract ("actuals=model+residuals"), but better check.
I recommend a good forecasting textbook. This is a very good start. Otherwise, read through the help pages.
The appropriate error measure will depend on your personal loss function. Is your pain symmetric, and will it increase more strongly with larger errors? Then use MSE. Is your pain proportional to absolute errors? Then use MAE. Best to look at multiple error measures.

One tip: averaging forecasts will usually improve accuracy. Consider taking the average of your two models' forecasts per future time bucket.
auto.arima() apparently fits no drift, even if you allow it.

Solved – When is the AIC a good model selection criterion for forecasting and when is it not

I am not completely satisfied with my answer, but here goes.

To an extent, you are comparing apples and oranges. Your two calls to Arima() use method="ML", whereas your auto.arima() uses the default, which is method="CSS-ML". Then again, refitting everything with the default does not make a real difference.
Minimizing the AIC is asymptotically equivalent to minimizing the one-step ahead squared prediction error. (I don't have a reference at hand, sorry.) Note that this is an asymptotic result in a suitable statistical sense. It's quite possible for a handpicked model to outperform AIC on a limited length time series. And on a single one, at that.
Finally, as you write in a comment, the AirPassengers dataset exhibits strong multiplicative seasonality. ARIMA does not model multiplicative seasonality or trend; it can only deal with additive effects. Your overparameterized model gets the multiplicative trend and seasonality right, but it may also forecast this in a series that does not exhibit such effects. There are reasons why such large models are typically not considered.

To model multiplicative effects, allow auto.arima() to use Box-Cox transformations:
```
> (foo <- auto.arima(train,lambda="auto"))
Series: train 
ARIMA(0,1,1)(0,1,1)[12] 
Box Cox transformation: lambda= -0.3096628 

Coefficients:
          ma1     sma1
      -0.3936  -0.5713
s.e.   0.1035   0.0863

> accuracy(forecast(foo,h=24,biasadj=TRUE),test)
                     ME      RMSE       MAE        MPE     MAPE      MASE       ACF1 Theil's U
Training set -0.7186038  8.915531  6.691014 -0.2079082 2.753580 0.2341638 0.04889565        NA
Test set     28.5600533 31.711896 28.884516  6.2710488 6.348486 1.0108644 0.17279165 0.6372069
```
I cut out the AIC, because that is not comparable to the AIC on nontransformed data. Note that we end up much closer to your large model in terms of the test RMSE, but the model is much more interpretable, and I personally would trust it a lot more than an ARIMA(15,1,15)(4,1,4)[12] one. Incidentally, searching through more possible ARIMA models yields the exact same model:
```
> (bar <- auto.arima(train,max.p=15,max.q=15,max.P=4,max.Q=4,
+ lambda="auto",stepwise=FALSE,approximation=FALSE))
Series: train 
ARIMA(0,1,1)(0,1,1)[12] 
Box Cox transformation: lambda= -0.3096628 

Coefficients:
          ma1     sma1
      -0.3936  -0.5713
s.e.   0.1035   0.0863
```

Best Answer

Related Solutions

Solved – time series forecasting using auto.arima and exponential smoothing

Solved – When is the AIC a good model selection criterion for forecasting and when is it not

Related Question