Solved – Data split into training and test

cross-validationmachine learningtrain

I am implementing an EEG classifier with 15 subjects (patients), specifically a support vector machine classifier.

I randomly choose the training and testing sets, but I was faced by a question "how did you choose subjects in each set?". I looked for the response but I couldn't find a good one (cross validation wouldn't be the best solution in my case).

Could you please help me with this problem?

Best Answer

I am assuming you are seeking to classify the EEG data into one or more disease states e.g. seizure/non-seizure, pathological/non-pathological etc.

The best way to validate a classifier model for an application like this is to implement Leave One Out cross validation.

What I mean by this is to start with all data for patient 1 as the test set and all data for patients 2-15 as the training set and store the results. Next, set the data for patient 2 as the test set and the remainder as the training set. Do this for each patient's data in turn so that you have 15 classification results, one for each patient. The take the mean of these 15 values and you have an estimate for the classification performance of your classifier model on unseen data.