I'm doing some work with random forests in R using the randomForest
package, and I've run into something that seems odd to me. Even when the data is completely random, the AUC is never less than 0.5. For example, when I run the following:
library(randomForest)
df.sanity <- data.frame(A=sample(1:100, 2000, replace=T), B=sample(126:159, 2000, replace=T), C=sample(10:2000, 1000, replace=T), D=sample(1:2, 2000, replace=T), E=sample(30:40, 2000, replace=T), Class=as.factor(sample(0:1, 2000, replace=T)))
rf <- randomForest(x=df.sanity[1:1000,c("A", "B", "C", "D", "E")], y=df.sanity[1:1000, "Class"])
preds <- predict(rf, newdata=df.sanity[1001:2000,], type="prob")
auc(obs=df.sanity[1001:2000, "Class"], pred=preds[,2])
No matter how many times I run it, the AUC is never less than 0.5. It's often a bit over (up to 0.54 from what I've seen), but never less.
The only other AUC implementation I've used is Weka's, and I've seen AUCs < 0.5 there. Does the randomForest
package automatically flip the predictions to the reverse if the AUC is ever less than 0.5, or is there something else I'm misunderstanding here?
Best Answer
There is no
auc()
function in therandomForest
package. But based on the argument names you used (obs
andpred
), I think you might have used theauc()
function in theSDMTools
package. And yes, this function does flip the results if the calculated AUC is less than 0.5:This might be seen as a nice convenience feature (it’s easy to forget exactly how AUC functions want their arguments coded, and if you get an AUC < .5 in real life, you have usually just used the inverse/incorrect coding of the response vector ), but I think it’s a bad idea. If you try models that are (almost) as bad as random, e.g., in simulations, you will get estimates that are biased high (compared the ‘correct’ AUC estimator).
If you want the correct AUC estimates, you can use either
ROC()
from theEpi
packages. It draws a nice ROC plot, with the AUC embedded, and also returns an object with the AUC stored as theAUC
element.rcorr.sens()
from therms
package. It returns a list with the AUC stored as the first element.roc()
from thepROC
package if you manually specify thedirection
argument. It returns an object with the AUC stored as theauc
element.