Tools/languages/techniques I am using

- python
- scikit-learn
- different regression models (only linear regression is shown here for simplicity)

I am working on a regression problem. The data I have is time-series hourly consumption data and I am trying to make a step-ahead prediction.

I first prepared the data and made sure no data from the future is spilled into the training data. So for consumption at a certain hour (h0), the record will look as follows

feature1 | feature2 | target |
---|---|---|

h-2 | h-1 | h0 |

Where **h-1** and **h-2** are the previous two hours.

### Note

I am adding two hours here for simplicity. However, in reality, I am

using different lag values and moving averages as features.

I trained the model and then applied the predict function to test data.

After that, I plotted the actual vs prediction (*y_test* vs *y_predict*), but it seems that there is some shift where the prediction is shifted by one hour in the future as you can see below

I tried to shift the prediction back by one hour the performance difference was huge

R2

increasedfrom 0.64 to 0.89 (39%enhancement)

RMSE

droppedfrom 1003 to 536.8 (46.5%enhancement)

#### My Question

- What could I be doing wrong?
- Am I doing something wrong or could this shift be an indication of something else?

## Best Answer

This shift is an indication of a very strong correlation with the previous lag

`h-1`

and a low correlation with other feature variables.In other words, the model is mainly using

`h-1`

to estimate the current hour consumption`h`

.While this can lead to acceptable results (and sometimes really good results as well) in terms of

`R2`

and`RMSE`

. It also means that the model is not really better than a baseline model that just uses`h-1`

to estimate`h`

(i.e.`f(h) = h-1`

)In this case, a machine learning model is just adding complexity with no clear improvement in performance.

Nothing smart going on hereThis video from Marco Peixeiro, the author of the book

discusses this exact problem as wellTime Series Forecasting in Python