Solved – Splitting data into training, validation and test sets

classificationdata miningmachine learning

I am currently comparing 3 classification methods on a data set (in R).

To do so, I have run this over a loop (100 iterations). I have split my data into a training, validation and test set.

The training data is used to build the classifier. Each classifier is then tested on each point in the validation data. The classifier which has the highest accuracy is assigned to this point. The classification method which performs best on the validation data is then applied to the test data.

My question is, is there a specific name given to what I am doing here?

Best Answer

What you do, in its simplest form, is model selection, where you explore different ML models, or hyper-parameter configurations and decide the best one with your success criterion to move on. Here, you can also perform cross validation for being more accurate, and statistically reliable than using a single validation set. Other than your question, note that, when comparing classification algorithms, accuracy might not be a good choice for the reasons listed in here, especially in imbalanced datasets (but not restricted to).

Best Answer

Related Solutions

Solved – Sample selection algorithms to ensure that training & validation sets are representative

Solved – Why splitting the data into the training and testing set is not enough

Related Question