Solved – How to get accuracy, confusion matrix of binary SVM classifier equivalent to multiclass classification

confusion matrixmulti-classpythonscikit learnsvm

Consider a 3 class data, say, Iris data.

Suppose we want do binary SVM classification for this multiclass data using Python's sklearn. So we have the following three binary classification problems: {class1, class2}, {class1, class3}, {class2, class3}.

For each of the above problem, we can get classification accuracy, precision, recall, f1-score and 2×2 confusion matrix.

I have the following questions:

  1. How to combine the results of these 3 binary classifiers and get a result equivalent to a multiclass classifier, i.e., how to get the final classification accuracy, precision, recall, f1-score and a 3×3 confusion matrix from above 3 accuracies, precisions, recalls, f1-scores and 2×2 confusion matrices?

  2. Suppose we have 70%, 80% and 90% accuracies for above 3 class combinations. Should I get the final accuracy as accuracy.mean() +/- accuracy.std(), and the same for other metrics?

  3. Or, should I first get the final 3×3 confusion matrix, and from this matrix, I should compute accuracy, precision, recall, f1-score?

  4. How does a multiclass classification do it internally? Does it use the strategy in step-3? I am not interested in directly applying multiclass classification, but only binary classification and get the result equivalent to multiclass classification.

Now, suppose we also want to perform kFold cross-validation with the above 3 binary classifiers. So for each fold we will have accuracies, precisions, recalls, f1-scores, and 2×2 confusion matrices. In this case, I can get the average accuracy as accuracy.mean() +/- accuracy.std().

Also, in case of kFold cross-validation, for each binary classification problem, I can get an aggregated confusion matrix by adding 2×2 confusion matrices for each fold. I can also compute average accuracy, precision, etc. across kFolds from this aggregated confusion matrix for each binary classifier. However, the results are slightly different than using accuracy.mean() +/- accuracy.std() across kFolds. I think latter is more reliable.

  1. How to use kFold cross-validation for each binary classification problem, and get the final accuracy, precision, recall, f1-score and 3×3 confusion matrix?

I will appreciate if someone could answer above questions with implementation.

Below is minimum working example. Please note that a part of it is pseudo code for loading and splitting data in into train and test sets:

import pandas as pd
import numpy as np
from sklearn.model_selection import KFold
from sklearn import svm, datasets
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report, confusion_matrix
import time
import os

tic = time.clock()
# Import data
iris = datasets.load_iris()
X = iris.data                    
Y = iris.target

# Now, suppose we have three separate sets {data1, target1}, {data2, target2}, {data3, target3}
# for binaray classification.

#dataset = [{data1 + data2, target1 + target2}, {data1+ data3, target1 + target3}, {data2 + data3, target2 + target3}]

for d in dataset:

    #Import any pair, say, {data1 + data2, target1 + target2}. We will import 3 pairs one-by-one for 3 different binary classification problems.

    #data = data1 + data2
    #label = target1 + target2

    K = 10    #Number of folds
    for i in range(K):
        kf = KFold(n_splits=K, random_state=None, shuffle=False)

        cv = list(kf.split(data1))        
        trainIndex, testIndex = cv[i][0], cv[i][1]        
        trainData, testData = data.iloc[trainIndex], data.iloc[testIndex]
        trainData_label, testData_label = data_labe.iloc[trainIndex], data_labe.iloc[testIndex]

        # So now, we have Train, Test, Train_label, Test_label


        clf = []
        clf = svm.SVC(kernel='rbf')

        clf.fit(Train, Train_label)     

        predicted_label = clf.predict(Test)


        Accuracy_Score = accuracy_score(Test_label, predicted_label)
        Precision_Score = precision_score(Test_label, predicted_label,  average="macro")
        Recall_Score = recall_score(Test_label, predicted_label,  average="macro")
        F1_Score = f1_score(Test_label, predicted_label,  average="macro")

        print('Average Accuracy: %0.2f +/- (%0.1f) %%' % (Accuracy_Score.mean()*100, Accuracy_Score.std()*100))
        print('Average Precision: %0.2f +/- (%0.1f) %%' % (Precision_Score.mean()*100, Precision_Score.std()*100))
        print('Average Recall: %0.2f +/- (%0.1f) %%' % (Recall_Score.mean()*100, Recall_Score.std()*100))
        print('Average F1-Score: %0.2f +/- (%0.1f) %%' % (F1_Score.mean()*100, F1_Score.std()*100))

        CM = confusion_matrix(Test_label, predicted_label)

    print('-------------------------------------------------------------------------------')
toc = time.clock()
print("Total time to run the complete code = ", toc-tic)

Best Answer

First you should create 3x3 confusion matrix and then calculate statistics, we have two type of calculation (macro and micro) for overall statistics (overall precision, overall recall and ...) look at these links for formula :

Overall Accuracy

$$ACC_{Overall}=\frac{\sum_{i=1}^{|C|}TP_i}{Population}$$

Precision Micro

$$PPV_{Micro}=\frac{\sum_{i=1}^{|C|}TP_i}{\sum_{i=1}^{|C|}TP_i+FP_i}$$

Precision Macro

$$PPV_{Macro}=\frac{1}{|C|}\sum_{i=1}^{|C|}\frac{TP_i}{TP_i+FP_i}$$

Recall Micro

$$TPR_{Micro}=\frac{\sum_{i=1}^{|C|}TP_i}{\sum_{i=1}^{|C|}TP_i+FN_i}$$

Recall Macro

$$TPR_{Macro}=\frac{1}{|C|}\sum_{i=1}^{|C|}\frac{TP_i}{TP_i+FN_i}$$

I suggest my lib for your purpose : PyCM

Example Usage :

>>> from pycm import *
>>> y_actu = [2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2] # or y_actu = numpy.array([2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2])
>>> y_pred = [0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2] # or y_pred = numpy.array([0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2])
>>> cm = ConfusionMatrix(actual_vector=y_actu, predict_vector=y_pred) # Create CM From Data
>>> cm.classes
[0, 1, 2]
>>> cm.table
{0: {0: 3, 1: 0, 2: 0}, 1: {0: 0, 1: 1, 2: 2}, 2: {0: 2, 1: 1, 2: 3}}
>>> print(cm)
Predict          0        1        2        
Actual
0                3        0        0        
1                0        1        2        
2                2        1        3        




Overall Statistics : 

95% CI                                                           (0.30439,0.86228)
Bennett_S                                                        0.375
Chi-Squared                                                      6.6
Chi-Squared DF                                                   4
Conditional Entropy                                              0.95915
Cramer_V                                                         0.5244
Cross Entropy                                                    1.59352
Gwet_AC1                                                         0.38931
Joint Entropy                                                    2.45915
KL Divergence                                                    0.09352
Kappa                                                            0.35484
Kappa 95% CI                                                     (-0.07708,0.78675)
Kappa No Prevalence                                              0.16667
Kappa Standard Error                                             0.22036
Kappa Unbiased                                                   0.34426
Lambda A                                                         0.16667
Lambda B                                                         0.42857
Mutual Information                                               0.52421
Overall_ACC                                                      0.58333
Overall_RACC                                                     0.35417
Overall_RACCU                                                    0.36458
PPV_Macro                                                        0.56667
PPV_Micro                                                        0.58333
Phi-Squared                                                      0.55
Reference Entropy                                                1.5
Response Entropy                                                 1.48336
Scott_PI                                                         0.34426
Standard Error                                                   0.14232
Strength_Of_Agreement(Altman)                                    Fair
Strength_Of_Agreement(Cicchetti)                                 Poor
Strength_Of_Agreement(Fleiss)                                    Poor
Strength_Of_Agreement(Landis and Koch)                           Fair
TPR_Macro                                                        0.61111
TPR_Micro                                                        0.58333

Class Statistics :

Classes                                                          0                       1                       2                       
ACC(Accuracy)                                                    0.83333                 0.75                    0.58333                 
BM(Informedness or bookmaker informedness)                       0.77778                 0.22222                 0.16667                 
DOR(Diagnostic odds ratio)                                       None                    4.0                     2.0                     
ERR(Error rate)                                                  0.16667                 0.25                    0.41667                 
F0.5(F0.5 score)                                                 0.65217                 0.45455                 0.57692                 
F1(F1 score - harmonic mean of precision and sensitivity)        0.75                    0.4                     0.54545                 
F2(F2 score)                                                     0.88235                 0.35714                 0.51724                 
FDR(False discovery rate)                                        0.4                     0.5                     0.4                     
FN(False negative/miss/type 2 error)                             0                       2                       3                       
FNR(Miss rate or false negative rate)                            0.0                     0.66667                 0.5                     
FOR(False omission rate)                                         0.0                     0.2                     0.42857                 
FP(False positive/type 1 error/false alarm)                      2                       1                       2                       
FPR(Fall-out or false positive rate)                             0.22222                 0.11111                 0.33333                 
G(G-measure geometric mean of precision and sensitivity)         0.7746                  0.40825                 0.54772                 
LR+(Positive likelihood ratio)                                   4.5                     3.0                     1.5                     
LR-(Negative likelihood ratio)                                   0.0                     0.75                    0.75                    
MCC(Matthews correlation coefficient)                            0.68313                 0.2582                  0.16903                 
MK(Markedness)                                                   0.6                     0.3                     0.17143                 
N(Condition negative)                                            9                       9                       6                       
NPV(Negative predictive value)                                   1.0                     0.8                     0.57143                 
P(Condition positive)                                            3                       3                       6                       
POP(Population)                                                  12                      12                      12                      
PPV(Precision or positive predictive value)                      0.6                     0.5                     0.6                     
PRE(Prevalence)                                                  0.25                    0.25                    0.5                     
RACC(Random accuracy)                                            0.10417                 0.04167                 0.20833                 
RACCU(Random accuracy unbiased)                                  0.11111                 0.0434                  0.21007                 
TN(True negative/correct rejection)                              7                       8                       4                       
TNR(Specificity or true negative rate)                           0.77778                 0.88889                 0.66667                 
TON(Test outcome negative)                                       7                       10                      7                       
TOP(Test outcome positive)                                       5                       2                       5                       
TP(True positive/hit)                                            3                       1                       3                       
TPR(Sensitivity, recall, hit rate, or true positive rate)        1.0                     0.33333                 0.5  

>>> cm.matrix()
Predict          0        1        2        
Actual
0                3        0        0        
1                0        1        2        
2                2        1        3        

>>> cm.normalized_matrix()
Predict          0              1              2              
Actual
0                1.0            0.0            0.0            
1                0.0            0.33333        0.66667        
2                0.33333        0.16667        0.5            

Related Question