Solved – Transform multiclass classification to binary – benefits

binary dataclassificationmulti-class

I have 400 instances which must be categorized into 4 classes. Using WEKA, I tried out a couple of multiclass classifiers like J48 and Random Forests, but never made it above Kappa 0.6 and ~65% correctly classified instances (10-fold X-V)

Then I thought about transforming the problem into a 1-vs-all classification, which usually yields accuracies of ~90%. I would then remove the one "single" class and keep the merged ones. Then, again, having only instances with 3 classes, I would perform 1-vs-2 and remove the instances classified as belonging to the single class, ending up with a binary classification problem. As I said – I always have like 90% correctly classified instances, but I fear that the 10% incorrectly classified instances add up and propagate through the splitting and dataset reduction process —

so in the end I would maybe end up with the same garbage output I'd have when performing the original multiclass classification?! What's the stand on this approach? Does it have any benefits at all?

Best Answer

Translating a multiclass problem into a set of binary ones (using 1-vs-all or 1-vs-1) is typically done when you want to use algorithms that don't actually have a multiclass formulation, such as SVM.

If you do not plan to change the classification algorithm, you will probably end up with similar results after transforming your problem.

_{Note that changing algorithm will not necessarily improve your performance.}

Related Solutions

Classification – Which Performance Metrics Are Best for Highly Imbalanced Multiclass Datasets?

For unbalanced classes, I would suggest to go with Weighted F1-Score or Average AUC/Weighted AUC

Let's first see F1-Score for binary classification.

The F1-score gives a larger weight to lower numbers.

For example,

when Precision is 100% and Recall is 0%, the F1-score will be 0%, not 50%.
When let us say, we have Classifier A with precision=recall=80%, and Classifier B has precision=60%, recall=100%. Arithmetically, the mean of the precision and recall is the same for both models. But when we use F1’s harmonic mean formula, the score for Classifier A will be 80%, and for Classifier B it will be only 75%. Model B’s low precision score pulled down its F1-score.

Now, come to the Mutliclass Classification

Let us suppose we have the five classes, class_1, class_2, class_3, class_4, class_5

and the model is having the following results for each class.

forula for precision for each class = (True Positive for class)/(Count of predicted Positive for that class)

e.g. precision for class_1 = (True Positive for class_1)/(Count of Predicted of class_1)

forula for Recall for each class = (True Positive for class)/(Actual Positive for that class)

e.g. precision for class_1 = (True Positive for class_1)/(Total instances of class_1)

Formula for F1: F1 is the geometric mean of Precision and Recall i.e.

F1 = 2*(Precision*Recall)/(Precision+Recall)

Macro-F1 = Average(Class_1_F1 + Class_2_F1 + Class_3_F1 + Class_4_F1 + Class_5_F1)

Macro-Precision = Average(Class_1_Precision + Class_2_Precision + Class_3_Precision + Class_4_Precision + Class_5_Precision)

Macro-Recall = Average(Class_1_Recall + Class_2_Recall + Class_3_Recall + Class_4_Recall + Class_5_Recall)

Problem with Macro calculation: When averaging the macro-F1, we gave equal weights to each class.

Weighted F1 Score:

We don’t have to do that: in weighted-average F1-score, or weighted-F1, we weight the F1-score of each class by the number of samples from that class.

Weighted F1 Score = (N1*Class_1_F1 + N2*Class_2_F1 + N3*Class_3_F1 + N4*Class_4_F1 + N5*Class_5_F1)/(N1 + N2 + N3 + N4 + N5)

References: https://towardsdatascience.com/multi-class-metrics-made-simple-part-ii-the-f1-score-ebe8b2c2ca1

Related Question