Solved – ARIMA real time predictions

arimaforecastingtime series

I have trained ARIMA model in Python (concretely SARIMAX, as I need seasonal ARIMA). That gives me a model which I can use to forecast one or multiple future values.

What would be the correct procedure to create real time predictions? Should I:

  • retrain ARIMA model each time a new value occurs and forecast one value
  • retrain ARIMA model after N moments and predict values for next N moments
  • something else?

Also, if I retrain model should I:

  • use all previous values
  • use last N values

If I should use only last N values how to determine size of N?

I'm familiar with concept that you generate a model periodically (but not too often) and use that model on real time values to predict the dependent variable, but in case of time series it seems that retraining a model is needed more often to get better predictions.

Best Answer

If you need only one-step-ahead forecasts, then calculate these forecasts and retrain each time you see new observations. Use all historical data. There is no reason not to use all data. (Using only the last $N$ observations truncates the distant past, not using the most recent observations truncates the recent past.)

Of course, if you need $h$-step-ahead predictions, your latest prediction will only be based on a model with observations $h$ periods old. You should still refit your model each time you get new data.

Of course, if you are constrained by computing capacity, it makes sense to weigh this advice against the small improvement it gets you, and you might choose only to update every $k$ periods. Similarly, if your forecast consumer is unhappy if the forecast changes appreciably every period, e.g., because different ARIMA models are selected today and yesterday (a change in the order of integration will be most visible). Here, a tradeoff between accuracy and acceptance might mean that you only refit every $k$ periods. Or that you indeed refit the parameters, but only allow the model to be changed every $k$ periods.

If your time series had a structural change in the past, it may make sense to start the history only after that, because ARIMA is not good at dealing with those. Alternatively, you could work with dummy regressors to capture the structural change using ARIMAX or regression with ARIMA errors.