Consider a 3
class data, say, Iris data.
Suppose we want do binary SVM classification for this multiclass data using Python's sklearn. So we have the following three binary classification problems: {class1, class2}, {class1, class3}, {class2, class3}
.
For each of the above problem, we can get classification accuracy, precision, recall, f1-score and 2×2 confusion matrix.
I have the following questions:
-
How to combine the results of these
3
binary classifiers and get a result equivalent to a multiclass classifier, i.e., how to get the final classification accuracy, precision, recall, f1-score and a 3×3 confusion matrix from above3
accuracies, precisions, recalls, f1-scores and 2×2 confusion matrices? -
Suppose we have
70%
,80%
and90%
accuracies for above3
class combinations. Should I get the final accuracy asaccuracy.mean() +/- accuracy.std(),
and the same for other metrics? -
Or, should I first get the final 3×3 confusion matrix, and from this matrix, I should compute accuracy, precision, recall, f1-score?
-
How does a multiclass classification do it internally? Does it use the strategy in step-3? I am not interested in directly applying multiclass classification, but only binary classification and get the result equivalent to multiclass classification.
Now, suppose we also want to perform kFold
cross-validation with the above 3
binary classifiers. So for each fold we will have accuracies, precisions, recalls, f1-scores, and 2×2 confusion matrices. In this case, I can get the average accuracy as accuracy.mean() +/- accuracy.std().
Also, in case of kFold
cross-validation, for each binary classification problem, I can get an aggregated confusion matrix by adding 2×2 confusion matrices for each fold. I can also compute average accuracy, precision, etc. across kFolds
from this aggregated confusion matrix for each binary classifier. However, the results are slightly different than using accuracy.mean() +/- accuracy.std()
across kFolds
. I think latter is more reliable.
- How to use
kFold
cross-validation for each binary classification problem, and get the final accuracy, precision, recall, f1-score and 3×3 confusion matrix?
I will appreciate if someone could answer above questions with implementation.
Below is minimum working example. Please note that a part of it is pseudo code for loading and splitting data in into train
and test
sets:
import pandas as pd
import numpy as np
from sklearn.model_selection import KFold
from sklearn import svm, datasets
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report, confusion_matrix
import time
import os
tic = time.clock()
# Import data
iris = datasets.load_iris()
X = iris.data
Y = iris.target
# Now, suppose we have three separate sets {data1, target1}, {data2, target2}, {data3, target3}
# for binaray classification.
#dataset = [{data1 + data2, target1 + target2}, {data1+ data3, target1 + target3}, {data2 + data3, target2 + target3}]
for d in dataset:
#Import any pair, say, {data1 + data2, target1 + target2}. We will import 3 pairs one-by-one for 3 different binary classification problems.
#data = data1 + data2
#label = target1 + target2
K = 10 #Number of folds
for i in range(K):
kf = KFold(n_splits=K, random_state=None, shuffle=False)
cv = list(kf.split(data1))
trainIndex, testIndex = cv[i][0], cv[i][1]
trainData, testData = data.iloc[trainIndex], data.iloc[testIndex]
trainData_label, testData_label = data_labe.iloc[trainIndex], data_labe.iloc[testIndex]
# So now, we have Train, Test, Train_label, Test_label
clf = []
clf = svm.SVC(kernel='rbf')
clf.fit(Train, Train_label)
predicted_label = clf.predict(Test)
Accuracy_Score = accuracy_score(Test_label, predicted_label)
Precision_Score = precision_score(Test_label, predicted_label, average="macro")
Recall_Score = recall_score(Test_label, predicted_label, average="macro")
F1_Score = f1_score(Test_label, predicted_label, average="macro")
print('Average Accuracy: %0.2f +/- (%0.1f) %%' % (Accuracy_Score.mean()*100, Accuracy_Score.std()*100))
print('Average Precision: %0.2f +/- (%0.1f) %%' % (Precision_Score.mean()*100, Precision_Score.std()*100))
print('Average Recall: %0.2f +/- (%0.1f) %%' % (Recall_Score.mean()*100, Recall_Score.std()*100))
print('Average F1-Score: %0.2f +/- (%0.1f) %%' % (F1_Score.mean()*100, F1_Score.std()*100))
CM = confusion_matrix(Test_label, predicted_label)
print('-------------------------------------------------------------------------------')
toc = time.clock()
print("Total time to run the complete code = ", toc-tic)
Best Answer
First you should create 3x3 confusion matrix and then calculate statistics, we have two type of calculation (macro and micro) for overall statistics (overall precision, overall recall and ...) look at these links for formula :
Overall Accuracy
$$ACC_{Overall}=\frac{\sum_{i=1}^{|C|}TP_i}{Population}$$
Precision Micro
$$PPV_{Micro}=\frac{\sum_{i=1}^{|C|}TP_i}{\sum_{i=1}^{|C|}TP_i+FP_i}$$
Precision Macro
$$PPV_{Macro}=\frac{1}{|C|}\sum_{i=1}^{|C|}\frac{TP_i}{TP_i+FP_i}$$
Recall Micro
$$TPR_{Micro}=\frac{\sum_{i=1}^{|C|}TP_i}{\sum_{i=1}^{|C|}TP_i+FN_i}$$
Recall Macro
$$TPR_{Macro}=\frac{1}{|C|}\sum_{i=1}^{|C|}\frac{TP_i}{TP_i+FN_i}$$
I suggest my lib for your purpose : PyCM
Example Usage :