Solved – Comparing F1 score across imbalanced data sets

classificationmachine learningmodel-evaluationunbalanced-classes

I am working with multiple strongly imbalanced binary data sets (# of majority class > 20x # of minority class). Although all the data sets are strongly imbalanced, the ratio of the classes differs between data sets (Ex. One data set has 360/5600 examples as minority, another has 120 out of 6400 minority).

Can you can compare model performance based on F1 scores across strongly imbalanced data sets with slightly different class ratios? I am asking, for example, if I got an F1 score of 0.3 on data set A with class ratio 360/5000 and an F1 score of 0.6 on data set B with class ratio 120/6400, can I say that the classification model performed better on data set B?

If not, is there another performance metric that can be used as such?

Best Answer

F1 is a suitable measure of models tested with imbalance datasets. But I think F1 is mostly a measure for models, rather than datasets. You could not say that dataset A is better than dataset B. There is no better or worse here; dataset is dataset.

Related Question