Solved – the best measure for unbalanced multi-class classification problem

classificationmetricunbalanced-classes

What are some possible classification metric for an unbalanced problem ? Due to skeweness of the distribution, accuracy value is not so meaningful. For instance, if I predict all the classes to class 1 I could still get 70% accuracy.

Best Answer

My apologies, just saw how old the question was -- why was it on the top of the list?

Answer (which is as good as it gets with limited information):

Of what kind is the data?

You should probably never use detection accuracy or certainly not when your classifier outputs a score or probability. How do you classify? The underlying loss function of your classification algorithm is usually a good measure to start with when it comes to evaluation performance.

I would not lean towards 1~vs~all analytic approaches, such as the precision recall curve(s). It won't get you very far -- you would have to test each class against all others and then combine these results somehow. Harmonic mean, a-priori likelihood given the class to be tested, ... ? It is unclear what these measures will actually tell you.

If you have probabilistic output , the negative log likelihood is a good place to start with.

If you already have 70% accuracy for class 1, which means 70% of your dataset are class 1, then you might be in the situation that your classifier gives up on some smaller classes and rather tries to satisfy a possible regularization term. But this is all really dependent on your classification scheme. If you want a clearer answer, you need to tell us the whole story. ;)