Time Series – Cross-Validation Techniques for Time Series Data

cross-validationMATLABsvmtime seriesvalidation

What is an appropriate cross-validation technique for time series data?

I have a daily 4 years time series data and fitting a SVM model by MATLAB R2015b:

SVMModel = fitcsvm(Input, binary_output,'KernelFunction','RBF','BoxConstraint',1);
CVSVMModel = crossval(SVMModel);
z = kfoldLoss(CVSVMModel)

This a binary classification problem. As default I used 10-fold cross validation, but because of the random nature of this method I think this is not suitable for time series data.

Questions:

  1. Is it better to use other techniques like sliding window validation as discussed here?
  2. How we can implement these techniques in MATLAB?
  3. Are there any predefined functions for other proposed techniques?

Best Answer

Sliding window is perhaps the most straightforward solution for time series, see e.g. Hyndman & Athanasopoulos "Forecasting Principles and Practice" Chapter 2.5 (bottom of the page) and Rob J. Hyndman's blog post "Time series cross-validation: an R example".

However, Bergmeir et al. "A note on the validity of cross-validation for evaluating time series prediction" (working paper) suggest that regular leave-$K$-out cross validation may work well even in a time series context when purely autoregessive models are used. Here is the abstract:

In this work we have investigated the use of cross-validation procedures for time series prediction evaluation when purely autoregressive models are used, which is a very common use-case when using Machine Learning procedures for time series forecasting. In a theoretical proof, we showed that a normal K-fold cross-validation procedure can be used if the lag structure of the models is adequately specified. In the experiments, we showed empirically that even if the lag structure is not correct, as long as the data are fitted well by the model, cross-validation without any modification is a better choice than OOS evaluation. Only if the models are heavily misspecified, are the cross-validation procedures to be avoided as in such a case they may yield a systematic underestimation of the error.

Precise conditions for that to hold are laid out in the working paper.

Related Question