Solved – Can a Precision-Recall curve or a ROC curve be horizontal

matplotlibprecision-recallpythonrocscikit learn

I am working on a binary classification task on imbalanced data.

Since the accuracy is not so meaningful in this case.
I use Scikit-Learn to compute the Precision-Recall curve and ROC curve in order to evaluate the model performance.

But I found both of the curves would be a horizontal line when I use Random Forest with a lot of estimators, it also happens when I use a SGD classifier to fit it.

The ROC chart is as following:

enter image description here

And the Precision-Recall chart:

enter image description here

Since Random Forest behaves randomly, I don't get a horizontal line in every run, sometimes I also get a regular ROC and PR curve. But the horizontal line is much more common.

Is this normal? Or I made some mistakes in my code?

Here is the snippet of my code:

classifier.fit(X_train, Y_train)
try:
    scores = classifier.decision_function(X_test)
except:
    scores = classifier.predict_proba(X_test)[:,1]

precision, recall, _ = precision_recall_curve(Y_test, scores, pos_label=1)
average_precision = average_precision_score(Y_test, scores)

plt.plot(recall, precision, label='area = %0.2f' % average_precision, color="green")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision Recall Curve')
plt.legend(loc="lower right")
plt.show()

Best Answer

I got the answer from my duplicate post here:

Is it possible that Precision-Recall curve or a ROC curve is a horizontal line?

The horizontal lines are possible but not normal. The reason I got horizontal is that I happen to choose a very easy testing data.

To solve this. Just simply apply the Stratified Cross Validation to get a more generalized calculation.

The following charts are what I got:

enter image description here

enter image description here

Related Question