Solved – Trying to understand reasons behind low true positive rate in confusion matrix

classificationconfusion matrixlogistic

I need help in deciphering a confusion matrix. Here are the results:

[[484  10]
 [108  42]]

Some relevant metrics are:

Prevalence: 23%
Precision: 80.7%
Specificity: 98%
False Positive Rate: 2%
True Positive Rate: 28%
Misclassification Rate: 18%
Accuracy: 81.7%

Moreover, here's the classification report:

             precision    recall  f1-score   support

          0       0.82      0.98      0.89       494
          1       0.81      0.28      0.42       150

avg / total       0.82      0.82      0.78       644

I can't understand why although this has a good accuracy and precision score (combined with a low misclassification rate), yet it performs really badly at identifying true positives?! Is it because the overall incidence of true positives is so low, performing badly in that domain doesn't make it a bad classifier? I needed something that could actually predict a positive outcome with high precision.

Best Answer

Precision and recall are traded off generally and depend on the threshold. P = tp/(tp+fp) and r=tp/(tp+Fn). Note that if fp is 0, precision is perfect. Fn may be still high, however, so recall may be very low.

It might help to analyze a precision recall curve to see what thresholds give your desired performance.

You can have a very high accuracy given class imbalance by just having the classifier choose the more common class.

Update:

Since recall is low for "yes" cases, it means that it is classifying most of them as "no" (false negatives). It classifies most "nos" as "no" however because recall for "no" cases is high. Ie, it is sensitive for no cases but not very sensitive for "yes" cases.

Using predict_proba, you can see the actual probability estimates per case given by your classifier. Sklearn I believe uses 0.5 as a threshold; if you change this threshold to 0.1, for example, you can increase your recall for positive cases a lot! Essentially you lower your bar for calling a case positive. dichotomizing the outputs is therefore a step taken after estimating parameters--you can kind of think of it as something done after building the classifier that depends on some threshold (although the threshold could technically be considered a hyperparamegrt of a classifier and therefore something to be estimated as well). So, to just increase recall, just change this threshold. To see precision recall pairs for different thresholds, use the precision_recall_curve in sklearn. To see different sensitivity and specify pairs, use roc_curve.

Ultimately, improving the classifier itself--e.g., improving the probability estimates --which then indirectly affect recall, might be done by obtaining more data as suggested, notably perhaps more features rather than more observations (consider trying to detect some disease with only age--recall and precision may never be good. Add however the result of some genetic test and both recall and precision may be very high even with the same number of observations), or perhaps regularization, a different model, etc.