Solved – Hyperparameter tuning in multiclass classification problem: scoring metric relevant

machine learningmetricmodel-evaluation

I'm working with an imbalanced multi class dataset. I try to tune the parameters of a DecisionTreeClassifier, RandomForestClassifier and a GradientBoostingClassifierusing a randomized search and a bayesian search.

For now I used just accuracy for the scoring which is not really applicable for assessing my models performance (which I'm not doing). Is it also not suitable for the parameter tuning?

I found that for example recall_micro yields the same results as accuracy. This should be the same for other metrics like f1_micro.

So my question is: Is the scoring relevant? Can a different metric lead to different results? If yes, which metric should I use?

Best Answer

Yes, the scoring is relevant.

Hyperparameter tuning is done by ranking hyperparameter sets and choosing the best one. The best one here is identified by a scoring metric. Ideally you would want the scoring metric to be identical with the final evaluation metric.

For an imbalanced multiclass dataset I would recommend to use average classwise accuracy (mean of diagonal of the normalized confusion matrix), since it is not biased towards the class with the highest number of samples.