Solved – pROC versus ROCR

rroc

This is a very basic question but I don't get why the following provides different results when applying pROC or ROCR, see plot.

Exp = c(1,0,1,1,1,1,1,0,0,1)
Pred = c(63.2,110.8,55.57,34.40,34.16,53.8,76.3,76.3,94.8,61.3)

# ########################## pROC ##########################
rocobj <- roc(response = Exp, predictor = Pred)
plot.roc(rocobj,main="pROC")

# ########################## ROCR ##########################
ROCRpred<-prediction(Pred,Exp)
plot(performance(ROCRpred,'tpr','fpr'),main="ROCR")

The interpretation of 1/0 should be same, why is it not?

Another question is what if I want to use say P/N for levels? Is there an order in which I have to define them?
enter image description here

Best Answer

The prediction function

If you do not specify which class is a positive case and a negative case then the prediction function needs to make up it's mind about positive and negative cases automatically. It does this as following:

> prediction

...

if (label.format == "ordered") {
    if (!is.null(label.ordering)) {
        stop(paste("'labels' is already ordered. No additional", 
            "'label.ordering' must be supplied."))
    }
    else {
        levels <- levels(labels[[1]])
    }
}
else {
    if (is.null(label.ordering)) {
        if (label.format == "factor") 
            levels <- sort(levels(labels[[1]]))
        else levels <- sort(unique(unlist(labels)))
    }
    else {
        if (!setequal(unique(unlist(labels)), label.ordering)) {
            stop("Label ordering does not match class labels.")
        }
        levels <- label.ordering
    }
    for (i in 1:length(labels)) {
        if (is.factor(labels)) 
            labels[[i]] <- ordered(as.character(labels[[i]]), 
              levels = levels)
        else labels[[i]] <- ordered(labels[[i]], levels = levels)
    }
}

....

by specifying label.ordering = c(1,0) like

ROCRpred<-prediction(Pred, Exp, label.ordering = c(1,0))

you will get what you want.

Note that you can find help in R by typing help(prediction) and when you type just the name of the function prediction then you can see the function itself. (and of course you can replace this for any other function)


Conventions

You better use the following "conventions":

  1. Use the 'higher' label for the positive class and the 'lower' label for the negative class.
  2. Use the higher score a stronger tendency to the positive class. (currently you give the highest prediction score for the lowest class labels)

The following is a quote from the help file of the prediction function:

Since scoring classifiers give relative tendencies towards a negative (low scores) or positive (high scores) class, it has to be declared which class label denotes the negative, and which the positive class. Ideally, labels should be supplied as ordered factor(s), the lower level corresponding to the negative class, the upper level to the positive class. If the labels are factors (unordered), numeric, logical or characters, ordering of the labels is inferred from R's built-in < relation (e.g. 0 < 1, -1 < 1, 'a' < 'b', FALSE < TRUE). Use label.ordering to override this default ordering. Please note that the ordering can be locale-dependent e.g. for character labels '-1' and '1'.

So if you use

ROCRpred<-prediction(-Pred, Exp)

it works as well (in the sense that the curve is in the upper half, but note that there can still be a difference: prediction(-Pred, Exp) is not the same as prediction(Pred, -Exp), an image is shown later in this post).


Why did roc work but prediction not?

The roc function from the pROC package automatically determines the direction whether a higher score relates to a higher/lower probability of the positive class.

You still have to be very clear about the positive cases and negative cases though. You can get different results:

difference positives/negatives