I am trying to solve a multiclass classification problem using SVC as the base estimator and GridSearchCV to tune my results. Mentioned below is the code and the error being received:
svc_clf = SVC(C=0.7,tol=0.01,kernel='rbf',cache_size=500)
param_grid = {'C':np.linspace(0.1,1.0,10),'tol':[1e-7,1e-6,1e-5,1e-4,1e-3,1e-2,1e-1],'kernel':['rbf','poly'],'degree':[3,4,5,6,7]}
gs_svc = GridSearchCV(estimator=svc_clf,param_grid=param_grid,scoring='f1',cv=5)
gs_svc.fit(X_train,y_train)
Below is the error received:
ValueError: Target is multiclass but average='binary'. Please choose another average setting.
Upon my research I found that 'f1-score' isn't for multi-class classification.
Please suggest which metric should be used for GridSearchCV for a multi-class classification problem ?
Best Answer
Accuracy might look tempting but not a good metric in general. In multilabel classification, for each class we'll have
f1
score,precision
,recall
values etc. You need to decide how to average them, which is what the error is saying actually. The options arebinary
(which is the default one),micro
,macro
,weighted
,samples
.binary
option needs positive and negative classes, and doesn't work in multilabel problems.To reiterate
sklearn
documentation linked above,micro
option calculatesTP,FP
etc. globally, whilemacro
does it specific to each class and averages them.weighted
is the weighted version ofmacro
average that accounts for class imbalance.And, this parameter needs to be passed into the scorer function, e.g.: