How to reduce the propagation of errors in multi-step time series forecasting

forecastinglagsmachine learningrandom foresttime series

I have a multi-step forecasting task where I am predicting values $H$ hours in the future.
Supposing that the forecast issue is done at time t, I will produce predictions for the next $H$ hours: $\{\hat{y}_{t+1}, \hat{y}_{t+2}, …, \hat{y}_{t+H}\}$.

One of the issue with multi-step forecasting when using lagged values of the target as predictors, is that for a given predicting time, the model will start to use formerly predicted values as features. For example, if my model take as input several features, among them lagged values of y (in the below example, lagged values considered 1 and 2 units of time ago), if I predict the $t+1$ value, I'll do:
$\hat{y}_{t+1} = model(y_t, y_{t-1}, other features)$

Then, when predicting for $t+2$, I'll use the $\hat{y}_{t+1}$ prediction as one of the feature.
$\hat{y}_{t+2} = model(\hat{y}_{t+1}, y_{t}, other features)$

For y_t+3, we would have:
$\hat{y}_{t+3} = model(\hat{y}_{t+2}, \hat{y}_{t+1}, other features)$

Etc.

The model I am currently using is a Random Forest, and it is trained using only true values of y as lagged features. The best predictor is the 1-unit lagged value (value at $t-1$ when predicting the t-th value). Actually, when looking at features importances, it is by far the predictor with the highest relative importance.

Relying too much on this predictor implies that the more we get far from the first predicting time and the closer we get to the forecast horizon, as errors tend to accumulate in the predictors.

Then, is there a method or a different ML model which can reduce the overall accumulation of errors and improve preditions accuracy ?

Best Answer

I do not think there is a way around the issue, fundamentally. There are two ways of $h$-step-ahead forecasting: iterative/recursive (like yours) and direct. The latter goes as follows: for $h$-step-ahead forecasts from the point $t$, use a model with features $y_{t-h+1},y_{t-h},\dots$. Here are some threads discussing them. Here are some research papers on the topic.

Iterative multi-step forecasts suffer from accumulation of small errors. Direct multi-step forecasts suffer from larger errors that do not accumulate. But in the end both types of forecasts can be expressed as functions of the same features, so pick your poison.

Related Question