Solved – Need help with lag features in regression forecasting

boostingforecastinglagsmachine learningtime series

I am trying to build a timeseries prediction model. The problem is that I'm still hesitant whether I should use lag features or not. What makes me wonder is the fact that the training data has these 'lag features' since the values of the past dates of the prediction target are available, but what about the forecasting data whose lag features are not available.

What I mean is, when I will try to forecast the only features that I have are the dates while my model is expecting also lagged features.

Best Answer

What you recognised is a common issue and it occasionally manifests into situations where people are perplex as to "why am I predicting flat values?". CV.SE has some very enlightening topics on this matter in: Why I get the same predict value in Arima model? and Flat Forecast from ARIMA and SARIMA.

Let's take as an example a simple time-series model, like a first order auto-regressive model AR(1), where $y_t = \beta_0 + \beta_1 y_{t-1} + \epsilon_t$ and $\epsilon_t \sim N(0, \sigma_\epsilon^2)$. In this case our estimates $\hat{y_t}$ are simply $\hat{y_t} = \hat{\beta_0} + \hat{\beta_1} y_{t-1}$ because $\epsilon_t$ is expected to be zero. Nevertheless as we extrapolate $y_{t-1}$ has to be itself estimated because it is unavailable. This leads to situation where after some point, we actually use our own predictions are input data. The fact that "we use our own predictions as inputs" is epitomised by seeing that certain time-series algorithms are presented under a filtering approach, the Kalman filter and the Holt-Winters filter being prime and widely used examples.

So to become particular to what was originally mentioned: if we want to create our own forecasting routine that does not simply offer one-step-ahead forecast we need to be able to be populate our "lagged features" with their predicted values. That's why most forecasting routines (e.g. forecast::forecast, smooth::forecast, prophet::make_future_dataframe, bsts::predict, KFKSDS::predict, etc.) have an explicit horizon, periods, n.ahead, etc. argument. We need to know how far we look into the future to appropriate update/populate our beliefs to get there!

Related Question