Solved – Suitable performance metric for an unbalanced multi-class classification problem

classificationmetricmodel-evaluationunbalanced-classes

I have an unbalanced multi-class classification problem with the following class distributions:

Class 0: 17.1% 
Class 1: 63.2% 
Class 2: 19.7%

I am using scikit-learn's Support Vector Classifier with 'balanced' class weights to classify samples into one of the three classes. However, I don't know what would be the most suitable performance metric to evaluate the result? So far I have been using F1-micro score, but not sure if this is the best option for multi-class imbalance problems?

Best Answer

There are many useful metrics which were introduced for evaluating the performance of classification methods for imbalanced data-sets. Some of them are Kappa, CEN, MCEN, MCC, DP, etc.

Disclaimer:

If you use python, PyCM module can help you to find and calculate these metrics.

Here is a simple code to get the recommended parameters from this module:

>>> from pycm import *

>>> cm = ConfusionMatrix(matrix={"Class1": {"Class1": 1, "Class2":2}, "Class2": {"Class1": 0, "Class2": 5}})  

>>> print(cm.recommended_list)
["Kappa", "SOA1(Landis & Koch)", "SOA2(Fleiss)", "SOA3(Altman)", "SOA4(Cicchetti)", "CEN", "MCEN", "MCC", "J", "Overall J", "Overall MCC", "Overall CEN", "Overall MCEN", "AUC", "AUCI", "G", "DP", "DPI", "GI"]

>>> score = cm.Kappa

Related Solutions

Solved – Optimising for Precision-Recall curves under class imbalance

The ROC curve is insensitive to changes in class imbalance; see Fawcett (2004) "ROC Graphs: Notes and Practical Considerations for Researchers".
Up-sampling the low-frequency class is a reasonable approach.
There are many other ways of dealing with class imbalance. Boosting and bagging are two techniques that come to mind. This seems like a relevant recent study: Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data

P.S. Neat problem; I'd love to know how it turns out.

Classification – Multi-Class Classification with Imbalanced Classes: Techniques and Strategies

As your class sizes are so big. I would perform a pre-downsampling to something like 5000+10000+10000+10000+10000. Do you really need more samples? Then downsample again and model independently and aggregate multiple forests afterwards. That will save time and memory. During modeling you may even only bootstrap ~5000 samples for each tree to speedup process. For each tree the bootstrap can be stratified, such that 1000 samples from each class are selected.

Here's a thread on how to train a balanced multi class forest with down sampling and 1-vs-rest ROC plot.

And here's a R-code example on 1-vs-rest roc plots:

library(AUC)
#simulated probabilistic prediction(yhat) vs true class (y)
obs=500
nClass=5
y = sample(1:nClass,obs,rep=T)
yhat = sapply(y,function(y) {
  pred.prob = rep(0,nClass)
  pred.prob[y] = 0.2
  pred.prob = pred.prob + runif(nClass)
  pred.prob = pred.prob / sum(pred.prob)
})

#plot 1-vs-all, one curve for each class
for(i in 1:nClass) plot(roc(predictions = yhat[i,],
                        labels = as.factor(y==i)),
                        add=i!=1,
                        col=i)

Best Answer

Related Solutions

Solved – Optimising for Precision-Recall curves under class imbalance

Classification – Multi-Class Classification with Imbalanced Classes: Techniques and Strategies

Related Question