Solved – Final Model from Time Series Cross Validation

cross-validationforecastingtime series

I have previous experience with 'normal' K-fold cross-validation for model tuning and I am slightly confused by the application in time-series models.

It is my understanding that for time-series models the corollary for cross-validation is the 'rolling forward origin' procedure described by Hyndman. This makes plenty of sense to me and the code below demonstrates the use of the tsCV function in R, from Hydman's blog, to show how the errors differ from CV vs. the entire dataset in one go.

library(fpp)
e <- tsCV(dj, rwf, drift=TRUE, h=1)
sqrt(mean(e^2, na.rm=TRUE))
## [1] 22.68249
sqrt(mean(residuals(rwf(dj, drift=TRUE))^2, na.rm=TRUE))
## [1] 22.49681

Now, in that link above it mentions that the drift parameter is re-estimated at each new forecast origin. In 'normal' CV I would have a grid of parameters I would be evaluating against each fold so I could get an average to determine the best parameters to use. I would then use those 'best' parameters to fit the full training set and use that as my final model to evaluate on my previously held out test set. Note, this is nested cross-validation so I am not training on my test set at any point.

This clearly is not the case with the 'rolling forward origin' procedure where the parameters are optimized for each fold (at least for the R methods like bats, tbats, auto.arima, etc.) . Am I mistaken to think about this method in terms of model parameter tuning or how would I chose the time series model parameters to set for the final model that would be used? Or is parameter tuning not consider an issue with time series models where optimization seems to be part of model fitting and the result of the CV is to just say how well each model performs overall? And that the final model built with the majority of the data at the end is the model I would use?

I realize this can be rephrased in an even simpler question. Following cross-validation ('rolling forward origin') do I just use the last model built (with the largest superset as the final fitted model? Or what is suggested?

Best Answer

You can combine rolling forward origin with k-fold cross-validation (aka backtesting with cross-validation). Determine the folds up-front once, and at each rolling time iterate through the k folds, train on k-1 and test on k. The union of all the held out test folds gives you one complete coverage of the entire dataset at that time, and the train folds cover the dataset k-1 times at that time, which you can aggregate in whatever way is appropriate (e.g., mean). Then score train and test separately as you ordinarily would to get the separate train/test scores at that time.

When optimizing parameters, create a separate holdout set first, and then do the cross-validation just described on only the remaining data. For each parameter to be optimized, you need to decide whether that parameter is independent of time (so you can perform the optimization over all rolling times) or dependent on time (so the parameter is optimized separately at each time). If the latter, you might represent the parameter as a function of time (possibly linear) and then optimize the time-independent coefficients of that function over all times.