Solved – Classifier with adjustable precision vs recall

classificationprecision-recall

I am working on a binary classification problem where it is much more important to not have false positives; quite a lot of false negatives is ok. I have used a bunch of classifiers in sklearn for example, but I think none of them have the ability to adjust the precision-recall tradeoff explicitly (they do produce pretty good results but not adjustable).

What classifiers have adjustable precision/recall? Is there any way to influence the precision/recall tradeoff on standard classifiers, eg Random Forest or AdaBoost?

Best Answer

Almost all of scikit-learn's classifiers can give decision values (via decision_function or predict_proba).

Based on the decision values it is straightforward to compute precision-recall and/or ROC curves. scikit-learn provides those functions in its metrics submodule.

A minimal example, assuming you have data and labels with appropriate content:

import sklearn.svm
import sklearn.metrics
from matplotlib import pyplot as plt

clf = sklearn.svm.LinearSVC().fit(data, labels)
decision_values = clf.decision_function(data)

precision, recall, thresholds = sklearn.metrics.precision_recall_curve(labels, decision_values)

plt.plot(recall, precision)
plt.show()