Solved – Should I make a time series stationary before passing it as an input for ARIMA model

arimatime series

I'm working through this tutorial and this guy run SARIMAX model for a time series with both seasonal and trend components:

# create SARIMAX model with previously determined lags
# sar_m = sarimax.SARIMAX(ts15_train.values,
                    trend='n', 
                    order=(2,1,1), 
                    seasonal_order=(2, 1, 1, 24), 
                    simple_differencing=False).fit()

It seems wrong to me. Am I right here? I read some tutorials here and there and I believe we should eliminate both seasonal and trend components first to make a time series stationary (by performing some transforming operations like ts = log(ts) etc), then predict (e.g. by ARIMA model the next K values) and then bring back our seasonal and trend components (e.g. add a running mean, pow(2, x)).

Best Answer

No, what you are suggesting is almost entirely incorrect.

For forecasting, "eliminating" trend and seasonal components in the training period puts you in the awkward position of "bringing back" the trend and seasonal components in the forecast period. If your model doesn't detrend/seasonally adjust by defining dynamics for those components (which is the case with many such methods), you have no forecast available for those components; you have to make some kind of ad hoc decision as to what it should be (e.g. copy the last year's seasonal component), or fit another, separate model to those components.

Seasonal ARIMA models explicitly include a type of stochastic trend (if the differencing order is at least 1) and seasonality, which means that they are perfectly capable of accounting for those, including in the forecast period, and there is no reason to remove them first. In particular, ARIMA models are not assumed to be stationary (they are assumed to have a particular type of non-stationarity, though, so for example will not handle error variance that increases with the level).

At the end of your post, you seem to be suggesting having a deterministic trend ("pow(2,x)"?), which should also be done jointly with estimating the ARIMA model (an ARIMAX or regression with ARIMA errors model), not as a separate step. This is because the parts of a model are rarely independent: your estimate of the trend parameters depends on your estimate for the ARIMA parameters.