Solved – Accuracy vs. area under the ROC curve

accuracyaucreliabilityroc

I constructed an ROC curve for a diagnostic system. The area under the curve was then non-parametrically estimated to be AUC = 0.89. When I tried to calculate the accuracy at the optimum threshold setting (the point closest to point (0, 1)), I got the accuracy of the diagnostic system to be 0.8, which is less than the AUC! When I checked the accuracy at another threshold setting which is way far from the optimum threshold I got the accuracy equal to 0.92. Is it possible to get the accuracy of a diagnostic system at the best threshold setting lower than the accuracy at another threshold and also lower than the area under the curve? See the attached picture please.

enter image description here

Best Answer

It's indeed possible. The key is to remember that the accuracy is highly affected by class imbalance. E.g., in your case, you have more negative samples than positive samples, since when the FPR ($=\frac{FP}{FP+TN}$) is close to 0, and TPR (= $\frac{TP}{TP+FN}$) is 0.5, your accuracy ($= \frac{TP+TN}{TP+FN+FP+TN}$) is still very high.

To put it otherwise, since you have many more negative samples, if the classifier predicts 0 all the time, it will still get a high accuracy with FPR and TPR close to 0.

What you call optimum threshold setting (the point closest to point (0, 1)) is just one of many definitions for optimal threshold: it doesn't necessarily optimize the accuracy.

enter image description here