You can specify method="none"
in trainControl
. For example:
train(Species ~ ., data=iris, method="rf", tuneGrid=data.frame(mtry=3),
trControl=trainControl(method="none"))
I'm not sure when this was implemented.
The summary printed for the model contains the line
6 0.76 0.68 0.0507 0.068
which tells you that the expected/average accuracy for a proprley cross-validaded (training separated from testing) experiment should be 0.76
I have never used the line
model$pred[model$pred$mtry == 6, c("pred", "obs")
before but I guess it is giving you the aggregated results of all the internal cross-validations done when testing for mtry=6. You get a 0.7893916 which is pretty close to 0.76.
Caret, by default also generates the final model with all the training data provided, which is the model used in the line
pred=predict(model, data_pred_scale),
so what is curious is that the random forest generated gets a 100% accuracy when tested with the data used to train it. It is not impossible, of course, but just curious.
This phenomenon is not technically called overfitting, it goes beyond that - I do not know any good reason to test a classifier on the data used to train it.
Best Answer
Generally speaking, Area Under the RoC (AUROC) statistic is used when you have imbalanced classes. For ex: 5% 1's and 95% 0's.
In practice, we are more interested in the AUROC to judge how well the model rank orders cases (i.e., rank from high probablity to low probablity of being a 1) where as Accuracy is... well you already know that.
In the context of model tuning, my advice would be to use AUC (especially if you have imabalanced classes) instead of Accuracy.