Solved – How to determine Accuracy/Classification rate with 5-fold cv in training and test set

caretcartcross-validationr

I'm trying to use 5-fold cv for training set and test set but I can't determine the confusion matrix using table() to calculate the classification rate (true positives and true negatives divided by total number) because the sets have different lengths but I'm probably doing it wrong:

(after defining dataset using read.table and column names)

#turn dataset lines into random order for splitting
variavel <- runif(nrow(dataset))
dataset <- dataset[order(variavel),]

#create 5 folds and set variable (vector) to save accuracy rates 
folds <- createFolds(dataset$Class, k=5)
str(folds)
Accuracy <- 0

ListFoldsTrain <- list()
ListFoldsTest <- list()

for (i in 1:5){

  trainingset<- dataset[-folds[[i]],]
  ListFoldsTrain[[i]]<- trainingset


  testset<- dataset[folds[[i]],]
  ListFoldsTest[[i]] <- testset

#run classification tree model
  tree.1 <-     rpart(Region~palmitic+palmitoleic+stearic+oleic+linoleic+eicosanoic+linolenic+eicosenoic, data=trainingset)


  #Confusion matrix
  tabletree <- table(trainingset$Region, predict(tree.1, type="class"))

  #Accuracy for each fold
  Accuracy[i] = sum(diag(tabletree))/length(trainingset[,1]);Accuracy[i]

}

#Accuracy for each fold
print(Accuracy)

But now how can I get the accuracy of the tree model for the test set?

Best Answer

It is almost the same as what you already have. You just need to specify the test data in your predict statement.

testtable <- table(testset$Region, 
    predict(tree.1, newdata=testset, type="class"))

#Accuracy for each fold
Accuracy[i] = sum(diag(testtable))/length(testset[,1])
Related Question