# Time Series Forecasting – Addressing Apparent Shift in Actual vs Predicted Values

feature-engineeringforecastingpredictive-modelsregressionscikit learn

Tools/languages/techniques I am using

• python
• scikit-learn
• different regression models (only linear regression is shown here for simplicity)

I am working on a regression problem. The data I have is time-series hourly consumption data and I am trying to make a step-ahead prediction.

I first prepared the data and made sure no data from the future is spilled into the training data. So for consumption at a certain hour (h0), the record will look as follows

feature1 feature2 target
h-2 h-1 h0

Where h-1 and h-2 are the previous two hours.

### Note

I am adding two hours here for simplicity. However, in reality, I am
using different lag values and moving averages as features.

I trained the model and then applied the predict function to test data.

After that, I plotted the actual vs prediction (y_test vs y_predict), but it seems that there is some shift where the prediction is shifted by one hour in the future as you can see below

I tried to shift the prediction back by one hour the performance difference was huge

R2 increased from 0.64 to 0.89 (39% enhancement)

RMSE dropped from 1003 to 536.8 (46.5% enhancement)

#### My Question

• What could I be doing wrong?
• Am I doing something wrong or could this shift be an indication of something else?

This shift is an indication of a very strong correlation with the previous lag h-1 and a low correlation with other feature variables.
In other words, the model is mainly using h-1 to estimate the current hour consumption h.
While this can lead to acceptable results (and sometimes really good results as well) in terms of R2 and RMSE. It also means that the model is not really better than a baseline model that just uses h-1 to estimate h (i.e. f(h) = h-1)