Solved – LOOCV in Caret package ( randomForest example) – not unique results

caretcross-validationrrandom forest

I pose you my doubts:
For what I know there is only a single way to perform a LOOCV for a model (i.e. testing each one of the N elements vs the model trained with the other N-1 elements).

Namely, this should be a LOOCV AUC:

library('randomForest')
library('pROC') #for ROC curve

irisData <- iris[1:60,]
irisData$Species <- as.factor(as.character(irisData$Species))

predictions <- 1:60

for (k in 1:60){ predictions[k] <- predict(randomForest(Species ~ Sepal.Length , data = irisData[-k,], mtry=1),type="prob", newdata = irisData[k,,drop=F])[2] }
auc(irisData$Species,predictions,direction="<", levels = levels(irisData$Species))

Area under the curve: 0.776

Repeating the code i always obtain the same value.
By using caret, i obtain

library('caret')

fitControl <- trainControl(
  method = 'LOOCV',                # k-fold cross validation 'cv'
  number = 1,                     # number of folds
  savePredictions = 'final',       # saves predictions for optimal tuning parameter
  classProbs = T ,                 # should class probabilities be returned
  summaryFunction=twoClassSummary  # results summary function
) 

train(Species ~ Sepal.Length, data=irisData ,method='rf',   tuneGrid=data.frame(mtry=1)  ,trControl = fitControl)

With AUC values between 0.770 and 0.780.

I tried to change number to 60 but the result is the same.

Where is the issue?

Best.

Best Answer

Yes, for randomForest you need to set the seed, you can see below:

library('randomForest')
library('pROC') 

irisData <- iris[1:60,]
irisData$Species <- as.factor(as.character(irisData$Species))

predictions <- 1:60

for (k in 1:60){
set.seed(1) 
predictions[k] <- predict(randomForest(Species ~ Sepal.Length , 
data = irisData[-k,], mtry=1),type="prob", 
newdata = irisData[k,,drop=F])[2] 
}
auc(irisData$Species,predictions,direction="<", 
levels = levels(irisData$Species))

Area under the curve: 0.776

For caret, you need a list of seed integers the length of your resampling, and the last one is the seed used for prediction on final model. So that makes a list of 61, all 1s, similar to above:

library('caret')

fitControl <- trainControl(
  method = 'LOOCV',                
  number = 1,                     
  savePredictions = 'final',        
  classProbs = T ,
  seed = as.list(rep(1,61)),                
  summaryFunction=twoClassSummary 
) 

train(Species ~ Sepal.Length, data=irisData ,method='rf',   
tuneGrid=data.frame(mtry=1)  ,trControl = fitControl)

60 samples
 1 predictor
 2 classes: 'setosa', 'versicolor' 

No pre-processing
Resampling: Leave-One-Out Cross-Validation 
Summary of sample sizes: 59, 59, 59, 59, 59, 59, ... 
Resampling results:

  ROC    Sens  Spec
  0.776  1     0.6