Solved – difference between Nested Cross Validation and Hold-one-Out

cross-validation

I understand that for small sample sizes its best not to opt for the traditional hold out validation but rather use the entire dataset and perform a k-fold cross-validation. However in my case i want to find the best hyper-parameters and select the best model, i understand that in this case a nested cross validation may be the better option.

If I understand the method of nested cross validation, i.e. take your entire dataset and split it into 2 folds. You then take each fold and perform a k-fold cross validation to determine best hyper-parametes on one fold. And for the other fold you perform a k-fold cross validation to determine the best model. Is my understanding of cross validation correct
and if so how is this any different to hold one validation?

Best Answer

Your fundamentally misunderstood the concept of cross validation.

  • $k$-fold cross validation: Have 1 loop

    • Equally divide dataset into $k$ folds.
    • For $i \in [1;k]$, test models on the $i$-th fold and train on the remaining folds.
    • The final result is the average results of $k$ testing folds.
  • nested $k$-fold cross validation: Have 2 loops.

    • For every $i$ above, we have a $k$-fold cross validation nested inside.
  • Holdout: literally hold a set of data out for testing.

In your case, when you split the data into 2 folds and perform two different tasks on each fold, you are not doing cross validation. The key thing to remember that whatever you want to do, you have perform it independently for every $i \in [1;k]$.

Thus, in your case, the correct way should be:

  • In the inner loop, you select best hyper-parameters by subsequently using different hyper-params to train model on the inner training folds and test on the inner testing fold.
  • Then use the best models selected in the inner loop to test on the outer loop.
  • The average results of the outer loop is the estimated performance of your model.
Related Question