Solved – Is it better to compute a ROC curve using predicted probabilities or distances from the separating hyperplane

rocscikit learn

My question is motivated in part by the possibilities afforded by scikit-learn. In the documentation, there are two examples of how to compute a Receiver Operating Characteristic (ROC) Curve.

One uses predict_proba to

Compute probabilities of possible outcomes for samples […].

, while the other uses decision_function, which yields the

Distance of the samples X to the separating hyperplane.

When should each one be used?

Best Answer

I am answering this from a pragmatic perspective, simply by looking at code and deducing from examples. A more theoretic answer could be a great supplement.

Generally both can be used. The difference is well explained here.

Yet most relevant, not all algorithms offer both predict_proba and decision_function. To my knowledge, every classifiers allows predict_proba in sklearn. For some - specifically SVC (Support Vector Classification) - both give exactly the same result. To check, I used this example and change the code once using predict_proba and once decision_function.

Specifically I changed:

probas_ = classifier.fit(X[train], y[train]).predict_proba(X[test])
# Compute ROC curve and area the curve
fpr, tpr, thresholds = roc_curve(y[test], probas_[:, 1])

to:

probas_ = classifier.fit(X[train], y[train]).decision_function(X[test])
# Compute ROC curve and area the curve
fpr, tpr, thresholds = roc_curve(y[test], probas_)   

And both yield the exact same result as you can see in the images:

enter image description here

enter image description here

Yet this only counts for SVC where the distance to the decision plane is used to compute the probability - therefore no difference in the ROC.

In another example a specific line of code is relevant for this question:

if hasattr(clf, "decision_function"):
    Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
else:
    Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]

Therefore, deducing from the sklearn example, I would recommend you use the decision_function wherever possible and if not, use the probability provided by predict_proba.

Examples for algorithms which do not provide a decision_function in sklearn:

  • KNeighborsClassifier()
  • RandomForestClassifier()
  • GaussianNB()