Solved – the best measure for unbalanced multi-class classification problem

classificationmetricunbalanced-classes

What are some possible classification metric for an unbalanced problem ? Due to skeweness of the distribution, accuracy value is not so meaningful. For instance, if I predict all the classes to class 1 I could still get 70% accuracy.

Best Answer

My apologies, just saw how old the question was -- why was it on the top of the list?

Answer (which is as good as it gets with limited information):

Of what kind is the data?

You should probably never use detection accuracy or certainly not when your classifier outputs a score or probability. How do you classify? The underlying loss function of your classification algorithm is usually a good measure to start with when it comes to evaluation performance.

I would not lean towards 1~vs~all analytic approaches, such as the precision recall curve(s). It won't get you very far -- you would have to test each class against all others and then combine these results somehow. Harmonic mean, a-priori likelihood given the class to be tested, ... ? It is unclear what these measures will actually tell you.

If you have probabilistic output , the negative log likelihood is a good place to start with.

If you already have 70% accuracy for class 1, which means 70% of your dataset are class 1, then you might be in the situation that your classifier gives up on some smaller classes and rather tries to satisfy a possible regularization term. But this is all really dependent on your classification scheme. If you want a clearer answer, you need to tell us the whole story. ;)

Related Solutions

Solved – Which performance measure for unbalanced binary classification without an ‘active’ class

Have a look at the Matthews Correlation Coefficient

$$MCC = \frac{TP \cdot TN - FP \cdot FN}{\sqrt{ (TP + FP)(TP + FN)(TN + FP)(TN + FN) }}$$

I have seen it pretty often as performance metrix in classification of SNPs dataset. Have a look at this link as well, they discuss the difference between AUC and MCC

Otherwise you can just compute an average accuracy (average error rate), I have seen people using it in multiclass problems as well.

$$AAcc = \frac{1}{2} \bigg( \frac{TP}{TP + FN} + \frac{TN}{TN + FP} \bigg) $$

Usually it is used in authentication systems under the form of Half Total Error Rate. E.g. here they provided a statistical test for that.

Classification – Multi-Class Classification with Imbalanced Classes: Techniques and Strategies

As your class sizes are so big. I would perform a pre-downsampling to something like 5000+10000+10000+10000+10000. Do you really need more samples? Then downsample again and model independently and aggregate multiple forests afterwards. That will save time and memory. During modeling you may even only bootstrap ~5000 samples for each tree to speedup process. For each tree the bootstrap can be stratified, such that 1000 samples from each class are selected.

Here's a thread on how to train a balanced multi class forest with down sampling and 1-vs-rest ROC plot.

And here's a R-code example on 1-vs-rest roc plots:

library(AUC)
#simulated probabilistic prediction(yhat) vs true class (y)
obs=500
nClass=5
y = sample(1:nClass,obs,rep=T)
yhat = sapply(y,function(y) {
  pred.prob = rep(0,nClass)
  pred.prob[y] = 0.2
  pred.prob = pred.prob + runif(nClass)
  pred.prob = pred.prob / sum(pred.prob)
})

#plot 1-vs-all, one curve for each class
for(i in 1:nClass) plot(roc(predictions = yhat[i,],
                        labels = as.factor(y==i)),
                        add=i!=1,
                        col=i)

Best Answer

Related Solutions

Solved – Which performance measure for unbalanced binary classification without an ‘active’ class

Classification – Multi-Class Classification with Imbalanced Classes: Techniques and Strategies

Related Question