Solved – Finding true positive / negative and false positive / negative rates using R

rroc

I have a data frame with two classes. I want to find the true positive and false positive rate and then plot the ROC curve.

I tried this:

new <- data.frame(ytrue=c(1,0,1,1,0,0,1,0,1,0),
                  ypred=c(0.98,0.94,0.86,0.74,0.73,0.64,0.53,0.39,0.34,0.31))
new
 ytrue ypred
1      1  0.98
2      0  0.94
3      1  0.86
4      1  0.74
5      0  0.73
6      0  0.64
7      1  0.53
8      0  0.39
9      1  0.34
10     0  0.31

library(ROCR)
pred <- prediction( new$ypred, new$true )
pred
An object of class "prediction"
Slot "predictions":
[[1]]
 [1] 0.98 0.94 0.86 0.74 0.73 0.64 0.53 0.39 0.34 0.31


Slot "labels":
[[1]]
 [1] 1 0 1 1 0 0 1 0 1 0
Levels: 0 < 1


Slot "cutoffs":
[[1]]
 [1]  Inf 0.98 0.94 0.86 0.74 0.73 0.64 0.53 0.39 0.34 0.31


Slot "fp":
[[1]]
 [1] 0 0 1 1 1 2 3 3 4 4 5


Slot "tp":
[[1]]
 [1] 0 1 1 2 3 3 3 4 4 5 5


Slot "tn":
[[1]]
 [1] 5 5 4 4 4 3 2 2 1 1 0


Slot "fn":
[[1]]
 [1] 5 4 4 3 2 2 2 1 1 0 0


Slot "n.pos":
[[1]]
[1] 5


Slot "n.neg":
[[1]]
[1] 5


Slot "n.pos.pred":
[[1]]
 [1]  0  1  2  3  4  5  6  7  8  9 10


Slot "n.neg.pred":
[[1]]
 [1] 10  9  8  7  6  5  4  3  2  1  0

perf <- performance( pred, "tpr", "fpr" )
perf
An object of class "performance"
Slot "x.name":
[1] "False positive rate"

Slot "y.name":
[1] "True positive rate"

Slot "alpha.name":
[1] "Cutoff"

Slot "x.values":
[[1]]
 [1] 0.0 0.0 0.2 0.2 0.2 0.4 0.6 0.6 0.8 0.8 1.0


Slot "y.values":
[[1]]
 [1] 0.0 0.2 0.2 0.4 0.6 0.6 0.6 0.8 0.8 1.0 1.0


Slot "alpha.values":
[[1]]
 [1]  Inf 0.98 0.94 0.86 0.74 0.73 0.64 0.53 0.39 0.34 0.31
plot(perf)

enter image description here

I am unsure of the output. And the ROC curve looks like a step function plot.

Best Answer

The plot is just fine. After you run prediction() from the package ROCR you get an object, pred, with several components. Two of them are pred$tp and pred$fp, which represent the number of true positives and false positives respectively. To calculate the rate you divide them by the maximum, which is 5 in this case:

tpr <- pred$tp/max(pred$tp)
fpr <- pred$fp/max(pred$fp)

plot(fpr, tpr, type = "l")

These values are the same you get after running performance(pred, "tpr", "fpr"). Check the x.values and y.values components in perf. These are the values plotted with plot(perf). You get an identical plot using plot(fpr, tpr, type="l"). The plots look like a step function because you only have five values.


UPDATE

It seems the structure of the object returned by prediction() has changed in recent ROCR versions. Now prediction() returns an S4 object, and so the information is stored in slots. This would be how the result from the OP data would look like now:

An object of class "prediction"
Slot "predictions":
[[1]]
 [1] 0.98 0.94 0.86 0.74 0.73 0.64 0.53 0.39 0.34 0.31


Slot "labels":
[[1]]
 [1] 1 0 1 1 0 0 1 0 1 0
Levels: 0 < 1


Slot "cutoffs":
[[1]]
 [1]  Inf 0.98 0.94 0.86 0.74 0.73 0.64 0.53 0.39 0.34 0.31


Slot "fp":
[[1]]
 [1] 0 0 1 1 1 2 3 3 4 4 5


Slot "tp":
[[1]]
 [1] 0 1 1 2 3 3 3 4 4 5 5


Slot "tn":
[[1]]
 [1] 5 5 4 4 4 3 2 2 1 1 0


Slot "fn":
[[1]]
 [1] 5 4 4 3 2 2 2 1 1 0 0


Slot "n.pos":
[[1]]
[1] 5


Slot "n.neg":
[[1]]
[1] 5


Slot "n.pos.pred":
[[1]]
 [1]  0  1  2  3  4  5  6  7  8  9 10


Slot "n.neg.pred":
[[1]]
 [1] 10  9  8  7  6  5  4  3  2  1  0

Also, each slot is a list, I guess to potentially include several predictions together. The code in my original response should therefore be adapted to this new format. Something like this:

tpr <- pred@tp[[1]]/max(pred@tp[[1]])
fpr <- pred@fp[[1]]/max(pred@fp[[1]])

plot(fpr, tpr, type = "l")