Solved – Hyperparameter tuning in multiclass classification problem: scoring metric relevant

machine learningmetricmodel-evaluation

I'm working with an imbalanced multi class dataset. I try to tune the parameters of a DecisionTreeClassifier, RandomForestClassifier and a GradientBoostingClassifierusing a randomized search and a bayesian search.

For now I used just accuracy for the scoring which is not really applicable for assessing my models performance (which I'm not doing). Is it also not suitable for the parameter tuning?

I found that for example recall_micro yields the same results as accuracy. This should be the same for other metrics like f1_micro.

So my question is: Is the scoring relevant? Can a different metric lead to different results? If yes, which metric should I use?

Best Answer

Yes, the scoring is relevant.

Hyperparameter tuning is done by ranking hyperparameter sets and choosing the best one. The best one here is identified by a scoring metric. Ideally you would want the scoring metric to be identical with the final evaluation metric.

For an imbalanced multiclass dataset I would recommend to use average classwise accuracy (mean of diagonal of the normalized confusion matrix), since it is not biased towards the class with the highest number of samples.

Best Answer

Related Solutions

Hyperparameter Tuning – Is Hyperparameter Tuning on Sample of Dataset a Bad Idea?

Solved – Hyperparameter tuning on the whole data set reasonable

Related Question