Solved – High precision with low recall SVM

classificationdata miningmachine learningsupervised learningsvm

I'm classifying a data set using SVM and those are the precision and recall values for two classes.

     precision    recall  f1-score   support

H       0.91      0.99      0.95      1504
R       0.81      0.23      0.36       192

avg/total 0.90      0.91      0.88      1696

Following is the confusion matrix.

Confusion Matrix:
[[1494   148]
 [ 10    44]]

enter image description here
Can I say this is a good classifier for my dataset based on average? I'm not sure because I'm getting a low recall value for class 'R'?

Best Answer

The quality of your classifier, as those metrics show, will depend on how you intend to use it. E.g.

It is a great classifier if you data is a set of documents, if you are looking for documents of type H, and you're main concern is to make sure that most relevant documents are retrieved (high recall on H). Furthermore, the precision of H, i.e. the percentage of retrieved documents are relevant, is high too, so that's even better in case having irrelevant document amongst the retrieved documents is costly.
It is a terrible classifier if you try to retrieve as many documents of type R as possible, because the recall on R is 0.23 only, which means you are going to miss 77% of the documents.
It is a great classifier if you want to retrieve just a few documents of type R (low recall on R doesn't matter in this case) but having irrelevant document amongst the retrieved documents is costly (since you have a high precision on R, you won't have to pay too much for irrelevant documents).
etc.

(Btw there is unfortunately no consensus on the confusion matrix notation, so when you post a conversion matrix, you might want to specify where the predicted and true values are, even though in most cases, we can infer it from the precision/recall values)

Best Answer

Related Solutions

Solved – “Good” classifier destroyed the Precision-Recall curve. What happened

Solved – High Recall – Low Precision for unbalanced dataset

Related Question