Solved – Validation set for hyperparameter tuning of ML time series model

hyperparametermachine learningtime series

I'm developing an ML-based model to forecast the daily sales of a whole month.

This model takes as input a set of precomputed time series features: day_of_week, day_of_month, day_of_year, week_of_year, month and so many more. Additionally, the time series have an strong month seasonal pattern, and the patterns might greatly differ from one month to another.

The problem is that I've been experiencing a high variability in the hyperparameters of the model, depending the chosen validation set.

Let's say I want to forecast July-2019, then I tried using different months, starting from July-2018 to June-2019 as validation set, finding a very different configuration of hyperparameters in each. I think this is due to the changing sales pattern between months.

For these reasons my intuition tell me to use June-2018 as validation set, as it is more "representative" of what my testing set would look like. However, It also seems that I'm loosing 11 months of data to validate the model.

Which approach for selecting the validation set you would recommend in this problem?

Best Answer

I also came across this problem when working on a forecasting project.

First say you are doing a grid search of your hyper-parameters and you have a set of parameters you want to test.

Because this is a time series dataset, we want to always predict in the future. Now depending on how many "folds" you wish to do, you can compute the CV error like so:

  • pick a choice of hyperparameter
  • fit for the first month and "validate" on the second month
  • compute your error
  • fit on the first two months and "validate" third month
  • compute your error
  • continue doing this till you have fit on n-1 months and validated on nth month
  • compute your average CV error

Do this for each hyperparameter in your search space. Choose the one that gives the least CV error.

After you have chosen the hyperparameter, you can fit the model on all the data except the month that you want to forecast for. Use the fitted model to forecast for the required month.

HTH.

Related Question