Solved – Evaluation method when using a large training set and a small test set

bootstrapclassificationcross-validationmachine learning

I am facing the evaluation of two text classifiers. I have a large training dataset (to be used for training only), and a separated small test set (to be used for testing only), both being balanced. Which one of the following methods should be most appropiated one and why?

1) Stratified Repeated held-out evaluation (repeated subsampling):
Sample k times without repetition from the training dataset, each sample being balanced. For every sample, classifier is trained with the sample, and accuracy is tested with the full test dataset. Results are averaged.

2) Stratified Cross-validation:
Divide the full training dataset in k slices with equal size, each slice being balanced. For every slice, classifier is trained with the slice, and accuracy is tested with the full test dataset. Results are averaged.

3) Stratified Bootstrapping:
Sample k times with repetition from the large training dataset, each sample being balanced. For every sample, classifier is trained with the sample, and accuracy is tested with the full test dataset. Results are averaged.

Best Answer

I would suggest performing repeated k-fold cross-validation on your training set in order to perform feature and model selection. For each repetition, this would involve randomly partitioning the training set into training (say 90% of original training set) and validation (e.g. 10% of original training set) sets and performing cross-validation.

You can then apply the selected model to your test set to obtain an independent, unbiased estimate of your classifier's generalized performance.

Edits based on comments below:

  1. The purpose of the 10 fold CV is to validate the procedure you use to train your model, and also to obtain an unbiased performance estimate. You can get your final trained model by applying all training set data to the feature and model selection procedures once you are happy with the model estimate produced through CV on the training data set.
  2. You should only test your model using the test set once using all the available test set data
  3. I have never used bootstrapping and so can't comment other than that I believe it to be an equally valid approach
Related Question