Solved – Calculate true positive rate (TPR) and false positive rate (FPR) from prediction values to form ROC curve

classificationrrocthreshold

I have the prediction values of different algorithms, as shown below. My question is: how can I get the FPR and TPR to form a ROC curve? If I compute the confusion matrix I am getting only one point, but I can't get a curve from it.

        TRUE     FALSE
20 0.3804752 0.6195248
22 0.4220737 0.5779263
25 0.5292302 0.4707698
5  0.1566432 0.8433568
7  0.3121428 0.6878572
8  0.2075050 0.7924950
9  0.1507119 0.8492881
14 0.2217667 0.7782333
15 0.6088052 0.3911948
18 0.4402029 0.5597971

Best Answer

You need both the predicted class probabilities (as you have them in your example) and the observed = real class labels to compare your predictions to.

From those, the steps of computing a ROC curve are simple:

  1. Compute the class predictions for all possible thresholds, using one of the predicted class probabilities as predictor. This means that probabilities being above the threshold become a $P$ prediction, and those below the threshold become an $N$ prediction. From those class predictions, compute the TPR and FPR (= 1-TNR) for the associated threshold. This means you will get one TPR and FPR rate per possible threshold value (which should be the difference to what you mentioned with the confusion matrix, which instead is computed just once using one desired threshold).

  2. Print all the TPR values against the FPR value to obtain a ROC curve.

As you are aiming to do this in R, here's a minimal explanatory example, using the pROC package:

    # demo data and model to have some predictions to generate a ROC curve from
    library(caret)
    model <- train(iris[51:150,3:4], factor(iris[51:150,5]), method = 'lda', metric = 'ROC', trControl = trainControl(method = 'repeatedcv', number = 10, repeats = 20, savePredictions = T, summaryFunction = twoClassSummary, classProbs = T))

    # this is what the predictions ('pred') and actual values ('obs') look like:
    head(model$pred)

    #         pred        obs   versicolor    virginica rowIndex parameter     Resample
    # 1 versicolor versicolor 0.9986561370 1.343863e-03        4      none Fold01.Rep01
    # 2 versicolor versicolor 0.9459910404 5.400896e-02        5      none Fold01.Rep01
    # 3 versicolor versicolor 0.9686428821 3.135712e-02       27      none Fold01.Rep01
    # 4 versicolor versicolor 0.9578875407 4.211246e-02       29      none Fold01.Rep01
    # 5 versicolor versicolor 0.9999888477 1.115228e-05       49      none Fold01.Rep01
    # 6  virginica  virginica 0.0001359808 9.998640e-01       56      none Fold01.Rep01

    # 1. calculate the TPR and FPR (= 1-TNR) for all possible thresholds. Use the class probability as predictor
    library(pROC)
    myRoc <- roc(response = model$pred$obs, predictor = model$pred$versicolor, positive = 'versicolor')
    # this is how the TPR and TNR rates over all possible thresholds could look like (showing some data from around middle):
    data.frame(myRoc$sensitivities, myRoc$specificities, myRoc$thresholds)[1000:1010,]

    #      myRoc.sensitivities myRoc.specificities myRoc.thresholds
    # 1000                0.92               0.951        0.5078172
    # 1001                0.92               0.952        0.5070752
    # 1002                0.92               0.953        0.5065697
    # 1003                0.92               0.954        0.5040185
    # 1004                0.92               0.955        0.5003522
    # 1005                0.92               0.956        0.4931192
    # 1006                0.92               0.957        0.4834452
    # 1007                0.92               0.958        0.4743940
    # 1008                0.92               0.959        0.4560578
    # 1009                0.92               0.960        0.4217748
    # 1010                0.92               0.961        0.3984879

    # 2. print TPR against FPR to obtain the roc curve
    plot(myRoc)

ROC

Related Question