Solved – Do I need a test set when using time series cross-validation

cross-validationtime series

I'm working with time series data (Forex data) using a random forest. When training the model using the caret package in R I have few options. One of the options is to use e.g. 10-fold cross validation but I'm not 100% sure that it's a good idea for time series. I could also use time series cross validation (timeslice in the caret package).

What I don't understand is the need for independent test set when using time series cross validation to get an estimate of the prediction error. As this figure shows (figure by Rob J Hyndman: https://robjhyndman.com/hyndsight/tscv/) during the training you've never used the validation set (red dots) when training the model, the validation set is always a new unseen data. Unlike 10-fold cross validation where 9 out of 10 times you have seen the data during the training so the prediction error is underestimated.

enter image description here

So my questing is, isn't the training error a perfect estimate of the testing error when using time series cross validation as the figure above shows ?

If I understand correctly the chapter 2.5 in "Forecasting: Principles and Practice" then I do not need an independent test set but I'm not 100% sure my understanding is correct.

Best Answer

If you just fix all hyperparameters and do time series cross validation as in the picture in your post, then you do not need a separate test set to evaluate the out-of-sample performance; the forecast error for the validation points is a fair evaluation of that.

But if you do tune your model and pick the best tuning values based on validation performance, then the validation error will be an optimistic estimate of test error.

Also, the following might be a misunderstanding:

Unlike 10-fold cross validation where 9 out of 10 times you have seen the data during the training so the prediction error is underestimated.

The underestimation happens only if you do hyperparameter tuning and select the best-tuned model. Otherwise, the validation error on fold $k$ is a fair evaluation of out-of-sample performance for that particular set of tuning parameters (not selected based on performance).

Related Question