Solved – In R how to compute the p-value for area under ROC

p-valuerroc

I struggle to find a way to compute the p-value for the area under a receiver operator characteristic (ROC). I have a continuous variable and a diagnostic test result. I want to see if AUROC is statistically significant.

I found many packages dealing with ROC curves: pROC, ROCR, caTools, verification, Epi. But even after many hours spent reading the documentation and testing, I couldn't find how. I think I've just missed it.

Best Answer

In your situation it would be fine to plot a ROC curve, and to calculate the area under that curve, but this should be thought of as supplemental to your main analysis, rather than the primary analysis itself. Instead, you want to fit a logistic regression model.

The logistic regression model will come standard with a test of the model as a whole. (Actually, since you have only one variable, that p-value will be the same as the p-value for your test result variable.) That p-value is the one you are after. The model will allow you to calculate the predicted probability of an observation being diseased. A Receiver Operating Characteristic tells you how the sensitivity and specificity will trade off, if you use different thresholds to convert the predicted probability into a predicted classification. Since the predicted probability will be a function of your test result variable, it is also telling you how they trade off if you use different test result values as your threshold.

If you are not terribly familiar with logistic regression there are some resources available on the internet (besides the Wikipedia page linked above):

I discuss some basics in my answer here: Interpretation of simple predictions to odds ratios in logistic regression; and (although written in a different context) I provide an overview of what logistic regression is and how it relates to OLS (regular) regression in my answer here: Difference between logit and probit models.
You can also read through some of the threads categorized under our logistic tag.
For how to fit a logistic regression model in R, the UCLA stats help website is generally excellent and has a relevant page here.

Best Answer

Related Solutions

Solved – Statistics for Area under the ROC curve

Imbalanced Data – Area Under the ROC Curve vs PR Curve for Imbalanced Data

Related Question