ROC curve 101
An ROC curve visualizes the predictive performance of a classifier for various levels of conservatism (measured by confidence scores). In simple terms, it illustrates the price you pay in terms of false positive rate to increase the true positive rate. The conservatism is controlled via thresholds on confidence scores to assign the positive and negative label.
The x-axis can be interpreted as a measure of liberalism of the classifier, depicting its false positive rate (1-specificity). The y-axis represents how well it is at detecting positives, depicting the classifier's true positive rate (sensitivity). A perfect classifier's ROC curve passes through $(0,1)$, meaning it can classify all positives correctly without a single false positive. This results in an area under the curve of exactly $1$.
Intuitively, a more conservative classifier (which labels less stuff as positive) has higher precision and lower sensitivity than a more liberal one. When the threshold for positive prediction decreases (e.g. the required positive confidence score decreases), both the false positive rate and sensitivity rise monotonically. This is why an ROC curve always increases monotonically.
Plotting an ROC curve
You need not compute the predictions for various thresholds as you say. Computing an ROC curve is done based on the ranking produced by your classifier (e.g. your logistic regression model).
Use the model to predict every single test point once. You'll get a vector of confidence scores, let's call it $\mathbf{\hat{Y}}$. Using this vector you can produce the full ROC curve (or atleast an estimate thereof). The distinct values in $\mathbf{\hat{Y}}$ are your thresholds. Since you use logistic regression, the confidence scores in $\mathbf{\hat{Y}}$ are probabilities, e.g. in $[0,1]$.
Now, simply iterate over the sorted values and adjust TP/TN/FP/FN as you go and you can compute the ROC curve point by point. The amount of points in your ROC curve is equal to the length of $\mathbf{\hat{Y}}$, assuming there are no ties in prediction.
To plot the final result, use a function that plots in zero order hold (ZOH), rather than linear interpolation between points, like MATLAB's stairs
or R's staircase.plot
. Also keep this in mind when computing the area under the curve (AUC). If you use linear interpolation instead of ZOH to compute AUC, you actually end up with the area under the convex hull (AUCH).
I agree with your concerns.
given that people in reality will seldom choose a FPR cut-off of 0.5 or higher, why people would prefer a ROC curve with FPR ranging from 0 to 1 and use the full AUC value (i.e. calculate the entire area under the ROC curve) instead of just reporting the area made from, say, 0 to 0.25 or to 0.5? Is that called "partial AUC"?
- I'm a big fan of having the complete ROC, as it gives much more information that just the sensitivity/specificity pair of one working point of a classifier.
- For the same reason, I'm not a big fan of summarizing all that information even further into one single number. But if you have to do so, I agree that it is better to restrict the calculations to parts of the ROC that are relevant for the application.
in the figure below, what can we say about the performances of the three models? The AUC values are: green (0.805), red (0.815), blue (0.768). The red curve turns out to be superior, but as you see, the superiority is only reflected after FPR > 0.2. Thanks :)
- That depends entirely on your application. In your example, if high specificity is needed, then the green classifier would be best. If high sensitivity is needed, go for the red one.
As to the comparison of classifiers: there are lots of questions and answers here discussing this. Summary:
- classifier comparison is far more difficult than one would expect at first
- not all classifier performance measures are good for this task. Read @FrankHarrells answers, and go for so-called proper scoring rules (e.g. Brier's score/mean squared error).
Best Answer
Edit: since you apparently do have scores and actual outcomes, you can calculate it. One tool that can do the job would be the pROC package in R. It contains an AUC function that takes as arguments the predicted scores and actual outcomes. Have a look at its documentation http://cran.r-project.org/web/packages/pROC/index.html
-- INITIAL ANSWER --
There is no such tool, because you lack necessary information. You need to have a score for each prediction as well as its true outcome. Without that kind of information, it is impossible to calculate AUC.