Solved – Multiple cross-validation and multiple train-test splits

Suppose we have only four observations in a dataset. Let's called them a,b,c and d.

If we perform a cross-validation in a k-fold, with k=2, we would get the following :

We get two groups of data, (a,b) and (c,d). We first learn on (a,b) then validate our machine learning model on (c,d). Then we learn on (c,d) and check with (a,b).

Is this really a complete 2-fold procedure?

Because, if we shuffle the data, we could get two other groups, let's say (a,c) and (b,d).

So this time, we would need to do another 2-fold cross-validation.

So my first question is, do we really need to perform multiple k-fold cross-validation, with shuffeling the data at every stage, in order to get a really good estimate of he performances of our model?

With k=number of observation, so a leave-one-out procedure, of course there is no need to do so, since shuffeling the data will give use the same groups.

Finally, if the answer of the abose question is yes, what is the advantage of (multiple) croos-validation compared to just multiple train-test splitting?

For example, we could shuffle the dataset, then take the first 80% of the observations to train and the last 20% to test. And do this again multiple times.

Am I totally wrong or is a single k-fold not enough to assess the performance of a model? And if so, what's the difference between it and doing multiple train-test?

Thanks guys

Solved – Multiple cross-validation and multiple train-test splits

Best Answer

Related Question

Best Answer

Related Solutions

Solved – Comparing classification algorithms using cross validation and caret’s train

Cross-Validation – How to Use the ‘Test’ Dataset After Cross-Validation

Related Question