Time-Series Cross Validation – How Many Folds for Cross Validation

cross-validationtime series

Question

Is it more statistically robust to calculate the mean of the mean squared errors from many folds for time series cross validation? Is it true in general that more folds for any cross validation strategy (k-fold, etc.) would lead to more statistically robust estimators of model performance? If possible, please mention a journal article/textbook that supports your answer.

Related Questions/Resources

It appears my questions leads to a series of other questions. I have included the links to related questions for anyone else who finds themselves at my question:

Optimal number of folds in K
-fold cross-validation: is leave-one-out CV always the best choice?

Choice of K in K-fold cross-validation

Bias and variance in leave-one-out vs K-fold cross validation

Journal Article: On the use of cross-validation for time series predictor evaluation… this is a nice paper in general on cv for time series, and they just use 5-fold.

Intuition

My intuition is that the answer is "yes, more folds is better" because if I take the mean of the mean squared errors for 5 folds that would lead to more examples of model performance than if I were to take the mean of the mean squared errors for 3 folds.

Context

Time series (aka walkforward) cross validation maintains the temporal structure of a dataset by not shuffling it and iteratively adding to each of n-folds (denoted as :param n_splits: to sklearn's TimeSeriesSplit cross validator. See the image belowfrom Sklearn's Cross Validation Strategies Webpage to visualize the cross validation strategy.

Best Answer

I dont know answer and cant comment, but if I can give my simple approach: think you are using your model to help others as you offering service based on your model. (like start planting seeds when temp raise above 0 for next 5 months)

my guess is training/test splits of data represent customer only little: *customer is more likely to dont know if model was trained on 10yrs or 9yrs of data; if model is trained well with 1 year data - its good (and many times computing 10yrs can become time-endless or too long i only guess this; whole point of using many data to fit model is to grasp "complexity" a.k.a. top/bottom of data (like when things start appearing again and again, we noticed aha this is whats these data is about/behave

its about stationarity of data source: like do wheater act on same principles like 100 years ago? i guess yes; so feeding model with 100 years of data allow to grasp it. but maybe its enough 10 years of data...i mean, only something that repeats, is able to forecast. many things are changing, but many changes are repeatable; if i know it is changing, its repeating of change :-)

proving in fit (training set) that data is fitten well - in like 200 years of data of hour points .. if model fits very well (no overfitting ofcourse); there is very very good chance for next 5 monts it remain same (data source stationarity)

if there is like "hm i didnt involve data that near, river was created so its more cold" ,, if "all" data that have huge influence on y data are involved in model; for me; its good model

this cross validation things set/training data i guess was made by model ppl, building bridge way to usage ppl; but as consumer of model we always do out of sample prediction; so traing set is often all data currently available. cross val is very good imho it is cheap in terms model is fitted once then making predictions

so for metrics here; is things like "rolling"-making forecasts and then "comparing" to real value when it comes; as in reality, it simulate fitting model and making prediction iterativly; like in reality we sit in building near field making forecast for example every day, and when model predict "yes you can start planting" we do.

cross validation is nice, simple good thing. rolling/simulation of customer usage is nice thing

i didnt have any article/book, I just wanted to share simplified view;

Is it true in general that more folds for any cross validation strategy (k-fold, etc.) would lead to more statistically robust estimators of model performance?

im not expert, but if model parameters from 10% train set, are very different from model trained on 80% data - i would give myself a little pause, because model is just structure of expected dependencies; if parameters from 10% are different from 80; its like "regime swtiching", but it should be included in model (like river mentioned above), or create other model which is grasping changes original model was not unexpecting to come/didnt pay attention is happening (or wasnt spotted in early stages) .. good thing is when model grasp good structure, so model with 10% data provides "same performance" as 80%; if parameters remains simmiliar(or same), this would i call robustnes of parameters; or robustnes of model; that it works stable over whole data. "it models the data well :-)"