Solved – Leave one out and stratified 10-fold cross validation

cross-validationdata miningdistributions

I am confused with the answers to the questions below:

Assume that we have a dataset D with 100 examples, 50 of which belong to the class ’good’
and 50 belong to the class ‘poor’. Assume further that we have a very naïve learning algorithm
(L) that produces a model that simply predicts the majority class of the training examples (if
there is no majority, it always predicts ‘good’)

1) Performing stratified 10-fold cross validation on D will give the same result as when using half the examples of each class for training and the other halves for testing. Answer> True

2) Leave-one-out cross validation will give a better accuracy estimate for how L will perform on examples that are independently sampled from the same distribution as D than when using half the examples for training and half the examples for testing. Answer> False.

May I know why the first answer is true and the second is false?

Best Answer

As L predicts the majority class (positive if there is no majority), for the full dataset it will always predict positive (good).

Q1:

For stratified cross-validation, the training set will have an equal number of patterns from each class (otherwise it isn't stratified), so L will again always predict positive (good). Likeiwse if we use half of the available examples of each class (i.e. 25 good and 25 poor), L will always predict good, so the answer to question 1 is true (although not very informative).

Q2:

For leave-one-out cross-validation, on the other hand, if a positive pattern is held out in a particular fold, the majority of the training examples (50 out of 99) will be negative (poor) and hence the L will classify it as "poor" and get the answer wrong. If a negative (poor) pattern is held out, the majority of the training examples will be positive (50 out of 99), and the L will classify the pattern as good and hence get the answer wrong. This means that the leave-one-out estimate of the accuracy is zero.

If you use a 50-50 training set, L will classify all patterns as positive, in which case the generalisation performance will be 50% if the test patterns are indeed drawn according to D.

This means that leave-one-out cross-validation will not give a more accurate performance estimator, so the answer "false" is correct.

Summary:

I am not sure what the point of the question was, I suspect it was intended to show that there is no perfect and universal performance evaluation scheme and that there are pathological cases where the leave-one-out estimator utterly fails.