Solved – Unbalanced dataset – ROC curve to compare classifiers

classificationmachine learningmodel-evaluationrocweka

I use the machine learning software WEKA for data mining on biological data. I would describe my dataset as unbalanced: It comprises around 2000 instances, splitting in classes of 900, 500, 350, 160 that are very important to have in the dataset and some less important smaller classes that are nice to have but can be removed from the dataset if they confuse the learning to much.
Currently I am comparing many different classifiers. I am not a very experienced statistician, but I read that ROC curves are commonly used to evaluate the performance of machine learning classifiers. However, I also read that ROC has drawbacks when it comes to unbalanced datasets.
Is there a better measure among the ones the WEKA output features (or can be calculated from them) for my dataset? Thats how the output looks like (here with the iris dataset):

=== Stratified cross-validation === 

Correctly Classified Instances         144               96      %   
Incorrectly Classified Instances         6                4      %   
Kappa statistic                          0.94  
Mean absolute error                      0.035 
Root mean squared error                  0.1586
Relative absolute error                  7.8705 %
Root relative squared error             33.6353 %
Total Number of Instances              150    


=== Detailed Accuracy By Class ===

               TP Rate   FP Rate   Precision   Recall  F-Measure   ROC Area  Class
                 0.98      0          1         0.98      0.99       0.99     Iris-setosa
                 0.94      0.03       0.94      0.94      0.94       0.952    Iris-versicolor
                 0.96      0.03       0.941     0.96      0.95       0.961    Iris-virginica
Weighted Avg.    0.96      0.02       0.96      0.96      0.96       0.968


=== Confusion Matrix === 

  a  b  c   <-- classified as
 49  1  0 |  a = Iris-setosa
  0 47  3 |  b = Iris-versicolor
  0  2 48 |  c = Iris-virginica

Best Answer

In some formulations of multi-class ROC AUC, it is the case that the AUC estimate is sensitive to relative class frequencies, but this is not true of all mutli-class ROC AUC formulations. Moreover, the ROC AUC formulation in the binary classification case is not sensitive to relative class frequencies. There are numerous performance measures which are sensitive to imbalanced data, such as accuracy, but insensitivity to class imbalance is one of the very appealing advantages to ROC AUC.

This paper develops the idea that binary ROC AUC is insensitive to class compositions, with extended discussion, with the basic idea being that ROC is all about the rates, rather than the absolute numbers of each class. Because ROC analysis measures the relative ranking of examples, class imbalance doesn't change the ROC curve.

In the multi-class case, there are a couple of ways to represent the problem. The class-reference formulation, for example, is sensitive to relative class frequencies. Alternatively, there is a method of combining all 1 vs 1 ROC AUC estimates that is not sensitive to class compositions. This is developed by Hand & Till (2001).


Tom Fawcett, "ROC Graphs: Notes and Practical Considerations for Data Mining Researchers" 2003. Intelligent Enterprise Technologies Laboratory, HP Laboratories (Palo Alto). (If I recall correctly, this paper was also eventually published in a peer-reviewed journal a few years later. Can't find the reference right now. Same author. Similar title.)

David Hand, Robert Till. "A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems." Machine Learning. November 2001, Volume 45, Issue 2, pp 171-186.

This is also developed in A. P. Bradley, "The use of area under the ROC curve in the evaluation of machine learning algorithms." Pattern Recognition, 30:1145-1159, 1997.

Related Question