Time-Series Forecasting – Optimizing Ensemble Time Series Models

arimaensemble learningexponential-smoothingforecastingtime series

I need to automate time-series forecasting, and I don't know in advance the features of those series (seasonality, trend, noise, etc).

My aim is not to get the best possible model for each series, but to avoid pretty bad models. In other words, to get small errors every time is not a problem, but to get big errors once in a while is.

I thought I could achieve this by combining models calculated with different techniques.

That is, although ARIMA would be the best approach for a specific series, it may not be the best for another series; the same for exponential smoothing.

However, if I combine one model from each technique, even if one model isn't so good, the other will bring the estimate closer to the real value.

It is well-known that ARIMA works better for long-term well-behaved series, while exponential smoothing stands out with short-term noisy series.

  • My idea is to combine models generated from both techniques in order to get more robust forecasts, does it make sense?

There might be many ways to combine those models.

  • If this is a good approach, how should I combine them?

A simple mean of forecasts is an option, but maybe I could get better predictions if I weight the mean according to some goodness measure of the model.

  • What would be the treatment of the variance when combining models?

Best Answer

Combining forecasts is an excellent idea. (I think it is not an exaggeration to say that this is one of the few things academic forecasters agree on.)

I happen to have written a paper a while back looking at different ways to weight forecasts in combining them: http://www.sciencedirect.com/science/article/pii/S0169207010001032 Basically, using (Akaike) weights did not consistently improve combinations over simple or trimmed/winsorized means or medians, so I personally would think twice before implementing a complex procedure that may not yield a definite benefit (recall, though, that combinations consistently outperformed selection single methods by information criteria). This may depend on the particular time series you have, of course.

I looked at combining prediction intervals in the paper above, but not at combining variance as such. I seem to recall a paper not long back in the IJF with this focus, so you may want to search for "combining" or "combination" through back issues of the IJF.

A few other papers that have looked at combining forecasts are here (from 1989, but a review) and here and here (also looks at densities) and here and here. Many of these note that it is still poorly understood why forecast combinations frequently outperform single selected models. The second-to-last paper is on the M3 forecasting competition; one of their main findings was (number (3) on p. 458) that "The accuracy of the combination of various methods outperforms, on average, the specific methods being combined and does well in comparison with other methods." The last of these papers finds that combinations do not necessarily perform better than single models, but that they can considerably reduce the risk of catastrophic failure (which is one of your goals). More literature should readily be found in the International Journal of Forecasting, the Journal of Forecasting and for more specific applications in the econometrics or supply chain literature.