I'm trying to fit a model to a time series, but I am pretty confused as to which is the best.
I'm looking at an arima model, and ets model and an stlf model, which each performed best within their own family of models. When comparing rolling forecasting errors for 6 month forecasts, they perform exactly equally well, each model has the smallest errors exactly one third of the time.
I then try to look at other criteria such as AIC, AICc and BIC, and get the following results (my problem is really the scale of the information criteria – it's about a factor hundred smaller for the stlf model, is it really that much better or is something else at play here?):
#The arima model:
Series: myts
ARIMA(0,1,0)(1,0,0)[12]
Coefficients:
sar1
0.8394
s.e. 0.0704
sigma^2 estimated as 19456: log likelihood=-229.81
AIC=463.61 AICc=463.99 BIC=466.72
#The ets model:
ETS(M,N,M)
Call:
ets(y = myts)
Smoothing parameters:
alpha = 0.5505
gamma = 1e-04
Initial states:
l = 500.5273
s=0.5977 0.3134 0.298 0.5218 1.6367 2.0899
2.1506 2.2123 0.8724 0.5279 0.4086 0.3708
sigma: 0.1507
AIC AICc BIC
438.9330 458.9330 461.1023
#The stlf model:
ETS(A,N,N)
Call:
ets(y = x, model = etsmodel)
Smoothing parameters:
alpha = 0.483
Initial states:
l = 6.0707
sigma: 0.1587
AIC AICc BIC
0.4533825 0.8170189 3.6204204
Can they be compared at all? I do think I remember something about only being able to compare these criteria between different models under certain conditions.
Best Answer
You can't compare information criteria between different fitting methods. AIC and friends involve a constant that different fitting algorithms set to different values. You can compare AICs for different models fitted by the same method. So no help there.
Looking at rolling out-of-sample forecasts was already exactly the right thing to do. Now you know that each model is best one third of the time. You could now also look at the magnitude of the errors (MAD or MSE) - perhaps one model sometimes yields very low, sometimes very high forecasts.
Failing that, it may well be that all three methods are equally good.
One smart trick to improve forecast accuracy is: calculate forecasts from all three methods and average them within each future time bucket. Averaging forecasts, in particular from "very different" methods, almost always improves accuracy and also reduces error variance.