Solved – How to control trade-off between precision and recall

classificationinterpretationmodel selectionprecision-recall

I applied different classification algorithms in combination with different sampling techniques to a dataset and I get > 100 different models with different performances.

I can choose a model for high precision or for high recall, but obviously not both at the same time.

Is there an approach/method/function out there where I can penalize either false positives or false negatives more – based on what is more/less important to me – so I can choose the perfect model out of all the ones I calculated?

Best Answer

Sure. You can use Fbeta score.

Beta = 1 means you value precision and recall equally, higher beta (beta > 1) means you value precision more then recall.

Related Solutions

Solved – Precision and recall in a multi-class classification system

For those who might be interested in an answer, this comes from a much more knowledgeable source than me (PhD candidate in NLP):

When doing multiclass classification, precision and recall are really only properly defined for individual classes (you can average across classes to get a general scores for the entire system, but it's not really that useful; in my opinion, you're probably better off just using overall accuracy as your metric of performance).

For an individual class, the false positives are those instances which were classified as that class, but in fact aren't, and the true negatives are those instances which are not that class, and were indeed classified as not belonging to that class (regardless of whether they were correctly classified).

Solved – Classifier with adjustable precision vs recall

Almost all of scikit-learn's classifiers can give decision values (via decision_function or predict_proba).

Based on the decision values it is straightforward to compute precision-recall and/or ROC curves. scikit-learn provides those functions in its metrics submodule.

A minimal example, assuming you have data and labels with appropriate content:

import sklearn.svm
import sklearn.metrics
from matplotlib import pyplot as plt

clf = sklearn.svm.LinearSVC().fit(data, labels)
decision_values = clf.decision_function(data)

precision, recall, thresholds = sklearn.metrics.precision_recall_curve(labels, decision_values)

plt.plot(recall, precision)
plt.show()

Best Answer

Related Solutions

Solved – Precision and recall in a multi-class classification system

Solved – Classifier with adjustable precision vs recall

Related Question