Time Series – Re-training the Entire Time Series After Cross-Validation

cross-validationforecastingtime series

In his book, Hands-on Time Series analysis with R, the author Rami Krispin says 'Typically, once we have trained and tested the model using the training and testing partitions, we will retrain the model with all of the data (or at least the most recent observation in the chronological order)'.

My question is this:

In time series cross-validation methods such as expanding and sliding window, the most recent observations fall within the test set, of course due to the chronological order.

Intuitively, the most recent observations can be the most influential predictors, although this is not always true. But, for the cases where the most recent observations are predictive, aren’t we missing the information from the most recent observations by not using them for training? If so, what are your thoughts on measuring the model performance using one of the time-series cross validation method first but then re-training the entire data for the final model, as Rami suggest?

But, when using the entire data for training, there is the danger of overfitting and, no validation.

Also, let’s say I put aside the last 10% of the time series as a test set for out of sample predictions. Now, the remaining 90% is the total train-validation set. When using a cross-validation method, the validation set (say another 10%) must be also the most recent, chronologically. At maximum, the remaining 80% is all I have for my model training and parameter tuning. After the cross-validation step, I have now a single chosen model with determined hyperparameters. Next, I retrain the entire 90% with this model and have the adjusted new parameters (based on the 90%), but still the model type itself was selected using the first 80% of the data. For example, if I am looking at a 10 year of historical data, my model is selected based on the first 8 years and that makes me wonder as well.

Any thoughts? Thanks.

Best Answer

This issue is indeed a bit of a problem in time series forecasting. (And more generally in prediction, if your test sample can be suspected of differing systematically from the training and validation samples.) I would make two points here.

First, whether the most recent data is really "most influential" is very much open to debate, and will depend heavily on your use case. If you are forecasting demand for a new product, yes. (But then you would probably be using specialized models, like the Bass and cross-train them on other products - not choose the model based on a holdout set of the focal time series.) But when my forecast consumers ask me to "put more emphasis on recent observations" or similar, I always push back unless they can provide an actual argument for why the data generating process should have evolved or changed recently. (This prior of mine may reflect that I am working in a very mature industry.)

Second, if there are actual reasons to suppose the DGP has changed, you should indeed treat the time series differently, and not rely on a holdout validation sample. For instance, you might use specialized models, like the Bass mentioned above. Or you might only use the most recent data, consider this a short time series and use an appropriate method. Or take one model fitted to the entire series and another one fitted only to the last observations and take the average of the two forecasts.

Bottom line: you really need to think about the time series you are forecasting. (Or trust in an automatic system and live with potentially lower accuracy - that may well be a rational use of your time.)