time-series – How to Invert Differencing in Time Series Data for Multiple Steps Prediction?

differencingforecastinglstmstationaritytime series

I have a time-series that I would like to use for predicting 36 timesteps in advance using LSTM. It is not stationary so I differenced the series by subtracting each point from the next one. My understanding is that after the prediction, I will add each step to the previous one in the original series so I obtain the final prediction inverted and then evaluate it. However, when it comes to production (future forecast), this would only work if I am forecasting 1 time step so I simply add it to the last point in the original data, but with 36 points, there will be 35 points missing, so how do I invert the predicted values in this case?

Best Answer

Say your original series is $x_t$ and the differenced series is $\Delta x_t:=x_t-x_{t-1}$. A future point $x_{t+h}$ can be expressed as $x_t+\Delta x_{t+1}+\dots+\Delta x_{t+h}$. If you can predict the increments $\Delta x_{t+1},\dots,\Delta x_{t+h}$ by $\widehat{\Delta x_{t+1}},\dots,\widehat{\Delta x_{t+h}}$, you would sum them and add the sum to $x_t$ to obtain a prediction of $x_{t+h}$: $$ \hat x_{t+h|t}=x_t+(\widehat{\Delta x_{t+1}}+\dots+\widehat{\Delta x_{t+h}}). $$

Alternatively, you could predict an $h$-period increment $\Delta_h x_{t+h}:=x_{t+h}-x_t$ directly and add it to $x_t$, since $x_{t+h}=x_t+\Delta_h x_{t+h}$. You would have $$ \hat x_{t+h|t}=x_t+\widehat{\Delta_h x_{t+h}}. $$

Related Solutions

Time Series Forecasting – Stationary vs Non-Stationary Time Series

[W]hat is the difference between forecasting using the original non-stationary series and the forecasting using the now stationary differenced series?

(Here I deliberately left out the qualification that the series can be transformed to a stationary series using first differencing and that the OP is interested in forecasting using ARIMA in particular.)

The problem with nonstationary data is that for most of the time series models, the model assumptions are violated when nonstationary data is used. This leads to the estimators no longer having the nice properties such as asymptotic normality and sometimes even consistency. So if you apply a model that requires a stationary series to a nonstationary series, you will likely get poor estimates of the model parameters and hence poor forecasts.

(Now let me add the qualification back.)

For an integrated series $x_t$ that can be made stationary using first differencing, $\Delta x_t$, and that can be approximated by an ARIMA model reasonably well, there are three ways to go:

Force stationarity and estimate an ARIMA($p,0,q$) model for the original series $x_t$.
Force, or allow for, first differencing so that you end up with ARIMA($p,1,q$) model for the original data $x_t$.
Difference the series manually and then apply ARIMA($p,0,q$) model for the differenced series $\Delta x_t$.

Option 1. is the only one that is clearly asking for trouble as it forces stationarity in presence of nonstationary data. Options 2. and 3. are essentially the same, the difference being in whether you difference $x_t$ manually outside the model or as an initial step within the model.

[C]an I expect the forecast for the stationary series to be more accurate than the forecast for non-stationary series?

If you have in mind an integrated series $x_t$ and its first-differenced stationary version $\Delta x_t$, you will have greater accuracy when forecasting $\Delta x_t$, but does that matter? It could be misleading to think that you can get more accurate forecasts by focusing on $\Delta x_t$ rather than $x_t$. It is perhaps the most natural to think about gains in accuracy when the underlying process of interest is kept the same, e.g. a gain in accuracy due to using a better approximation to the same process. Meanwhile, if you change the underlying object (go from $x_t$ to $\Delta x_t$), the gain is not really a gain, in the following sense. It is a bit like shooting at a target from 100m and from 10m. You will be more accurate from 10m, but isn't that obvious and irrelevant?

If you have in mind two unrelated series $x_{1,t}$ and $\Delta x_{2,t}$ where the first one is integrated while the second one is stationary, you may expect that in the long run you will have greater forecast accuracy for $\Delta x_{2,t}$. In the short run this might not hold if the variance of $\Delta x_{1,t}$ (the increments of the first process) is small compared to the variance of $\Delta x_{2,t}$.

I am aware that one advantage of forecasting with the stationary series will have the advantage of also producing forecast intervals (which are dependent upon the assumption of a stationary series).

Actually, you can get forecast intervals regardless of whether the series is integrated or stationary. If you model an integrated time series using its first differences, you obtain the forecast intervals and cumulatively add them when forming the forecast interval for the integrated series. That is why forecast intervals for an integrated series expand linearly while those of a stationary series expand slower than linearly (illustrations can be found in time series textbooks).

Solved – How to make LSTM predict multiple time steps ahead

There are different approaches

Recursive strategy

one many-to-one model

prediction(t+1) = model(obs(t-1), obs(t-2), ..., obs(t-n))
prediction(t+2) = model(prediction(t+1), obs(t-1), ..., obs(t-n))

Direct strategy

multiple many-to-one models

prediction(t+1) = model1(obs(t-1), obs(t-2), ..., obs(t-n))
prediction(t+2) = model2(obs(t-2), obs(t-3), ..., obs(t-n))`

Multiple output strategy

one many-to-many model

prediction(t+1), prediction(t+2) = model(obs(t-1), obs(t-2), ..., obs(t-n))`

Hybrid Strategies
- combine two or more above strategies

Reference : Multi-Step Time Series Forecasting

Best Answer

Related Solutions

Time Series Forecasting – Stationary vs Non-Stationary Time Series

Solved – How to make LSTM predict multiple time steps ahead

Related Question