I pose you my doubts:
For what I know there is only a single way to perform a LOOCV for a model (i.e. testing each one of the N elements vs the model trained with the other N-1 elements).
Namely, this should be a LOOCV AUC:
library('randomForest')
library('pROC') #for ROC curve
irisData <- iris[1:60,]
irisData$Species <- as.factor(as.character(irisData$Species))
predictions <- 1:60
for (k in 1:60){ predictions[k] <- predict(randomForest(Species ~ Sepal.Length , data = irisData[-k,], mtry=1),type="prob", newdata = irisData[k,,drop=F])[2] }
auc(irisData$Species,predictions,direction="<", levels = levels(irisData$Species))
Area under the curve: 0.776
Repeating the code i always obtain the same value.
By using caret, i obtain
library('caret')
fitControl <- trainControl(
method = 'LOOCV', # k-fold cross validation 'cv'
number = 1, # number of folds
savePredictions = 'final', # saves predictions for optimal tuning parameter
classProbs = T , # should class probabilities be returned
summaryFunction=twoClassSummary # results summary function
)
train(Species ~ Sepal.Length, data=irisData ,method='rf', tuneGrid=data.frame(mtry=1) ,trControl = fitControl)
With AUC values between 0.770 and 0.780.
I tried to change number to 60 but the result is the same.
Where is the issue?
Best.
Best Answer
Yes, for randomForest you need to set the seed, you can see below:
For caret, you need a list of seed integers the length of your resampling, and the last one is the seed used for prediction on final model. So that makes a list of 61, all 1s, similar to above: