Solved – Is this the right way to apply grid search and 5 fold validation with sklearn

pythonrbf-networkscikit learnsvm

I am using support vector machines and the rbf kernel to learn.

I would like to split my training data where 80% is the training set and 20% is the validation set.

Then I would like to apply a grid search and at the same time 5 fold validation.

Right now I am doing the following:

digits, labels = load_data()
Ccand = [np.power(2.0,i) for i in range(-5, 16)]
GammaCand = [np.power(2.0,i) for i in range(-10, -5)]

param_grid = [
        {'C': Ccand, 'gamma': GammaCand, 'kernel': ['rbf']}
]

digits_train, digits_test, labels_train, labels_test = train_test_split(
    digits, labels, test_size=0.2, random_state=0)

svr = svm.SVC()

clf = GridSearchCV(svr, param_grid, cv=5,verbose=10)
clf.fit(digits_train, labels_train)

I am not sure what will the above code do. The first line printed is:

Fitting 5 folds for each of 105 candidates, totalling 525 fits

the 105 candidates come from the different parameter combinations. What about the 5 folds? How are they generated? I initially pick at random 80% of my data to be training, and 20% to be validation set. So I imagine one of the folds would be the first split I generate with train_test_split. What about the rest of the 4 folds? Is it using exactly the same split, so basically not doing cross validation, or will it automatically pick a new random split defined exactly as in the train_test_split function?

Best Answer

Unless I am mistaken, at the moment you are using the 5 fold cross-validation and the gridsearch for your parameters on your initial training set, so the 80% that was randomly sampled in the beginning, but you are not using the test set at all. At every iteration of the grid search, you are using 4/5 of those 80% of your data (i.e. 64%) to train your SVM and 1/5 of those 80% of your data (i.e. 16%) to test it.

As a last step you should probably use the remaining 20% to evaluate the parameters that you found with the GridSearch.

Related Question