Solved – R – ROCR Library – Understanding predict and prediction method

predictionrrandom forest

My name is Abhi and I am trying to understand the difference between predict and prediction.

I am using the r language and my ide is rstudio. I have created a random forest model (r package randomForest)

myModel <- randomForest(Survived ~ .,data = modelData[,-1],importance = T)
modelResponses = predict(model,type = "prob") # I am guessing this gives probability of survival for each passenger 
temp1 = modelResponses[,2]
pred = prediction(temp1,trainData$Survived) #Not Sure whats is the pred object 

Now here are my questions

  1. What is the pred object?
  2. I have seen some code which uses the pred object to plot the auc curve. I know temp1 is the probability of survival for each record. Say the probability of survival for a particular record is 0.55. How does the prediction function know to classify this as survived or not-survived?
  3. How do I use this model to classify new data. Until now I was using modelResponses = predict(model,type = "prob") , but now I am not so sure. Again the same confusion as item 2, how does the system determine the best cut off point for probabilities.

Thanks a lot guys. Any help would be much appreciated.

Regards,

Best Answer

Here are answers:

1) Pred is an intermediate object, from which one can can plot various metrics like AUC, ROC-curve, costs associated with various false classifications etc.

2) If you have estimated class probabilities for the pred object, plot can show classifications based on all cut-off points from 0..1, not just arbitrary cut-off like 0.5.

3) System cannot itself determine which arbitrary cut-off point for the class probability must be used. You have to decide yourself that after consulting various metrics. Are costs associated with false positive classification same as false negatives? If they are not equal, then you must adjust the cut-off point accordingly.

And perhaps you could just use raw class probabilities and select cases to analysis where ration of class probabilities to mean class probability is high enough AND costs/benefits warrant it?

Related Question