Solved – Cross validation and small samples

cross-validationsmall-sample

I have a sample of 415 observations. With a sample with this size, it's possible to use cross validation?

Best Answer

Generally speaking, as you decrease sample size, your Cross Validation (CV) variance will increase. Furthermore, it is also dependent on the dimensionality of the dataset (i.e. if you have many more variables than samples). If you have a very high dimensional dataset your variance may be higher and it may be best to pursue feature selection methods (other topic).

As for an absolute cutoff, there exists no such thing. There was a study, admittedly slightly dated, on whether CV is appropriate for small sample sizes in Microarray Studies. These samples sizes were all less than 120. Their conclusions mirror what I mentioned above, however, there is also the point that bootstrap methods can be an alternative but with higher computational cost and some increased bias.

Another point to consider is the class distribution of your data. Is your data binary or multiclass? Are your classes balanced? There are many methods to address this situation as well if one class is rarer than the other(s) such as stratified CV among others. All in all, I suspect you shouldn’t have a problem applying CV to 415 samples. Even with 10-fold CV you would have ~40 samples in each fold, which is far more than many published studies can boast (biological literature).

Best Answer

Related Solutions

Solved – AIC versus cross validation in time series: the small sample case

Solved – Perform cross-validation on train set or entire data set

What is the purpose of CV?

Related Question