Solved – Choosing the correct AUC value with RocR package

aucr

I have somewhat of an odd question.

I'm currently working on a dataset and after training my model, I'm trying to get the correct AUC and I'm using the RocR package to get that value.

Im using these commands

  pred <- predict(model, testData,type='prob')  
  pr=prediction(pred[,2],testData$var)  
  prf <- performance(pr, measure = "tpr", x.measure = "fpr")  

  auc <- performance(pr, measure = "auc")  
  auc <- auc@y.values[[1]]

I've noticed that whether I add the type='prob' changes the value of my auc, meaning that if i predict just the binary output instead of the probabilities, the AUC changes. So my question is which value should I use? I'm a bit lost.

Best Answer

What's your model? Note that predict is not a ROCR function, ROCR starts from prediction. It's your job to specify the model correctly. Of course, what the model generates have a significant impact on you AUC statistics, because you're comparing your prediction against testData$var. You'll need to think what exactly you want to predict. ROC is just a statistical tool, it's garbage in and garbage out, so make sure don't give garbage inputs.

What's testData$var? It's supposed to be the classified labels. Are they related to your binary or probability prediction (I don't know what model you're using)?

Note that you don't choose your AUC statistics, you choose your model that maximize AUC statistics as much as possible.

Related Solutions

Cross Validation – What to Do After Nested Cross-Validation?

To answer your initial question (What to do AFTER Nested Cross-Validation?):

Nested cross-validation gives you several scores based on test data that the algorithm has not yet seen. Ordinary CV ("non-nested") gives you just one such score based on one held-out test set. So you can better evaluate the true performance of your model.

After nested CV you fit the chosen model on the whole dataset. And then you use the model to make predictions on new, unlabeled data (that are not part of your 1000 obs.).

I'm not 100% sure that you perform proper nested CV with an outer and an inner loop. To understand nested CV, I found this description helpful: enter image description here

(Petersohn, Temporal Video Segmentation, Vogt Verlag, 2010, p. 34)

Thoughts on bootstrapping as a better alternative than (nested) CV can be found here.

P.S.: I presume that you will more likely get answers if you only ask 1 or 2 questions instead of 7 in one post. Maybe you want to split them up so that others can find them more easily.

Solved – Differences in AUC calculation between pROC and ROCR

You never specified what is a positive result. Try the following:

aucPROC <- as.numeric(auc(temp.ds[,1],temp.ds[,2], direction="auto"))
aucPROC <- as.numeric(auc(temp.ds[,1],temp.ds[,2], direction="<"))
aucPROC <- as.numeric(auc(temp.ds[,1],temp.ds[,2], direction=">"))

Also you need to make sure that positive and negative examples are assigned to the correct values, although with 0s and 1s it will probably be detected correctly...

aucPROC <- as.numeric(auc(temp.ds[,1],temp.ds[,2], levels = c(0, 1), direction="<"))

See ?roc for more details.

Best Answer

Related Solutions

Cross Validation – What to Do After Nested Cross-Validation?

Solved – Differences in AUC calculation between pROC and ROCR

Related Question