I'm working with an imbalanced multi class dataset. I try to tune the parameters of a DecisionTreeClassifier
, RandomForestClassifier
and a GradientBoostingClassifier
using a randomized search and a bayesian search.
For now I used just accuracy
for the scoring which is not really applicable for assessing my models performance (which I'm not doing). Is it also not suitable for the parameter tuning?
I found that for example recall_micro
yields the same results as accuracy
. This should be the same for other metrics like f1_micro
.
So my question is: Is the scoring relevant? Can a different metric lead to different results? If yes, which metric should I use?
Best Answer
Yes, the scoring is relevant.
Hyperparameter tuning is done by ranking hyperparameter sets and choosing the best one. The best one here is identified by a scoring metric. Ideally you would want the scoring metric to be identical with the final evaluation metric.
For an imbalanced multiclass dataset I would recommend to use average classwise accuracy (mean of diagonal of the normalized confusion matrix), since it is not biased towards the class with the highest number of samples.