What is an appropriate cross-validation technique for time series data?
I have a daily 4 years time series data and fitting a SVM model by MATLAB R2015b:
SVMModel = fitcsvm(Input, binary_output,'KernelFunction','RBF','BoxConstraint',1);
CVSVMModel = crossval(SVMModel);
z = kfoldLoss(CVSVMModel)
This a binary classification problem. As default I used 10-fold cross validation, but because of the random nature of this method I think this is not suitable for time series data.
Questions:
- Is it better to use other techniques like sliding window validation as discussed here?
- How we can implement these techniques in MATLAB?
- Are there any predefined functions for other proposed techniques?
Best Answer
Sliding window is perhaps the most straightforward solution for time series, see e.g. Hyndman & Athanasopoulos "Forecasting Principles and Practice" Chapter 2.5 (bottom of the page) and Rob J. Hyndman's blog post "Time series cross-validation: an R example".
However, Bergmeir et al. "A note on the validity of cross-validation for evaluating time series prediction" (working paper) suggest that regular leave-$K$-out cross validation may work well even in a time series context when purely autoregessive models are used. Here is the abstract:
Precise conditions for that to hold are laid out in the working paper.