Solved – Score for classification of dataset composed by different class with class imbalance

cross-validationmulti-classscikit learnunbalanced-classes

I am searching for a classification score, preferably provided by Python scikit-learn, to evaluate classification in a cross-validation routine.

This classification score must be suitable for:

  • strong class imbalance
  • multiclass classification

The cardinality of the classes is the following:

         N 
Class1  19
Class2  34
Class3   8
Class4  17

Update

I defined a custom scorer based on ROC AUC score from sklearn. Basically I extended it to the multi-class problem by averaging the different scores for each class in a one-vs-all fashion. Is this feasible? Are there drawbacks in this approach?

Here is the Python/sklearn code:

from sklearn.metrics import roc_auc_score
from sklearn.preprocessing import LabelBinarizer

def custom_avg_roc_auc_score(truth, pred):

    lb = LabelBinarizer()
    lb.fit(truth)

    truth = lb.transform(truth)
    pred = lb.transform(pred)

    return roc_auc_score(truth, pred, average="macro")

avg_roc_auc_scorer = make_scorer(custom_avg_roc_auc_score)

Best Answer

I'd like to highlight two possible options for multiclass performance metrics under class imbalance:

For the latter: as you have $N$ classes, and ROC/AUC are conceptually designed for 2-class-problems, you will likely need to calculate one ROC curve and AUC value per individual class. This could be done e.g. in a "1-vs-all" manner, where you test for each class how much it is confused with other classes. The thereby obtained $N$ metrics can be used to e.g. look at the distribution of AUC values (e.g. boxplots or similar) to compare and select a best suited model from multiple models. If this process needs to be done fully automated, consider computing the mean/median and sd/mad of AUC over all classes (the first indicates the "average" performance over classes, the latter the performance spread). By doing this for all models you obtain scalar values which you could use to select a model suited for your problem.

Related Question