Solved – Error while performing multiclass classification using Gridsearch CV

hyperparametermachine learningmulti-classscikit learnsvm

I am trying to solve a multiclass classification problem using SVC as the base estimator and GridSearchCV to tune my results. Mentioned below is the code and the error being received:

svc_clf = SVC(C=0.7,tol=0.01,kernel='rbf',cache_size=500)
param_grid = {'C':np.linspace(0.1,1.0,10),'tol':[1e-7,1e-6,1e-5,1e-4,1e-3,1e-2,1e-1],'kernel':['rbf','poly'],'degree':[3,4,5,6,7]}

gs_svc = GridSearchCV(estimator=svc_clf,param_grid=param_grid,scoring='f1',cv=5)

gs_svc.fit(X_train,y_train)

Below is the error received:

ValueError: Target is multiclass but average='binary'. Please choose another average setting.

Upon my research I found that 'f1-score' isn't for multi-class classification.
Please suggest which metric should be used for GridSearchCV for a multi-class classification problem ?

Best Answer

Accuracy might look tempting but not a good metric in general. In multilabel classification, for each class we'll have f1 score, precision, recall values etc. You need to decide how to average them, which is what the error is saying actually. The options are binary (which is the default one), micro, macro, weighted, samples. binary option needs positive and negative classes, and doesn't work in multilabel problems.

To reiterate sklearn documentation linked above, micro option calculates TP,FP etc. globally, while macro does it specific to each class and averages them. weighted is the weighted version of macro average that accounts for class imbalance.

And, this parameter needs to be passed into the scorer function, e.g.:

scorer = sklearn.metrics.make_scorer(sklearn.metrics.f1_score, average = 'weighted')
gs_svc = GridSearchCV(estimator=svc_clf,param_grid=param_grid,scoring=scorer,cv=5)
Related Question