Solved – two questions; how to interpret the AUROC (area under the ROC curve)

auclogisticpredictive-modelsregressionroc

Suppose I have fitted a Logistic regression model that predicts $P(Y=1|\boldsymbol{X})$ the presence of a disease which is encoded to $1$, and if not then $0$. The AUROC (area under the roc curve) shows a high discriminatory power say: $85\%$. So any randomly chosen person with the disease will have a higher predicted probability than a person without the disease – $85\%$ of the time.

If the regression model gives me a subject $A$ with a predicted probability of $0.6$ and this seems to be a high probability compared to other subjects.

Would it be correct to say that there is $85\%$ chance that $A$ has the disease?

Can you give me some examples on how I can utilize my regression model knowing that it has strong discriminatory power?

Best Answer

Would it be correct to say that there is 85% chance that $A$ has the disease?

No. Assuming your model is correct and well-calibrated, the probability that $A$ has the disease is the model's estimate that $A$ has the disease.

The meaning of AUROC (area under the ROC curve, to distinguish from the less-common area under the precision-recall curve) is exactly what you state: given a randomly-selected diseased person and a randomly-selected healthy person, there is an 85% chance that your model ranks the diseased person higher than the healthy person.

Can you give me some examples on how I can utilize my regression model knowing that it has strong discriminatory power?

Suppose you need to construct a procedure that makes binary decisions without human intervention. For example, the test results are reported in an automated fashion for some purpose. It is possible to find all diseased individuals (perfect TPR) by labeling everyone as diseased, but your FPR will also be 1.0. Alternatively, you could capture no false positives, but at the cost of also capturing no diseased individuals.

A ROC curve compares the tradeoffs between these two extremes, i.e. the estimated TPR and FPR for any decision value cutoff. ROC curves are commonly summarized by AUROC, but this does not imply that a model with a higher AUROC necessarily has a better TPR/FPR tradeoff at a specific decision-value.

It's common in the machine learning community to compare two or more alternative models the basis of AUROC, but this does not imply that AUROC is useful in general or even for the particular purpose of that machine learning project.

Related Question