I think the question while somewhat trivial and "programmatic" at first read touches upon two main issues that very important in modern Statistics:

- reproducibility of results and
- non-deterministic algorithms.

The reason for the different results is that the two procedure are trained using different random seeds. Random forests uses a random subset from the full-dataset's variables as candidates at each split (that's the `mtry`

argument and relates to the random subspace method) as well as bags (bootstrap aggregates) the original dataset to decrease the variance of the model. These two internal random sampling procedures thought are not deterministic between different runs of the algorithm. The random order which the sampling is done is controlled by the random seeds used.
If the same seeds were used, one would get the exact same results in both cases where the `randomForest`

routine is called; both internally in `caret::train`

as well as externally when fitting a random forest manually. I attach a simple code snippet to show-case this. Please note that I use a very small number of trees (argument: `ntree`

) to keep training fast, it should be generally much larger.

```
library(caret)
set.seed(321)
trainData <- twoClassSim(5000, linearVars = 3, noiseVars = 9)
testData <- twoClassSim(5000, linearVars = 3, noiseVars = 9)
set.seed(432)
mySeeds <- sapply(simplify = FALSE, 1:26, function(u) sample(10^4, 3))
cvCtrl = trainControl(method = "repeatedcv", number = 5, repeats = 5,
classProbs = TRUE, summaryFunction = twoClassSummary,
seeds = mySeeds)
fitRFcaret = train(Class ~ ., data = trainData, trControl = cvCtrl,
ntree = 33, method = "rf", metric="ROC")
set.seed( unlist(tail(mySeeds,1))[1])
fitRFmanual <- randomForest(Class ~ ., data=trainData,
mtry = fitRFcaret$bestTune$mtry, ntree=33)
```

At this point both the `caret.train`

object `fitRFcaret`

as well as the manually defined `randomForest`

object `fitRFmanual`

have been trained using the same data but importantly using the same random seeds when fitting their final model. As such when we will try to predict using these objects *and because we do no preprocessing of our data* we will get the same exact answers.

```
all.equal(current = as.vector(predict(fitRFcaret, testData)),
target = as.vector(predict(fitRFmanual, testData)))
# TRUE
```

Just to clarify this later point a bit further: `predict(xx$finalModel, testData)`

and `predict(xx, testData)`

will be different if one sets the `preProcess`

option when using `train`

. On the other hand, when using the `finalModel`

directly it is equivalent using the `predict`

function from the model fitted (`predict.randomForest`

here) instead of `predict.train`

; no pre-proessing takes place. Obviously in the scenario outlined in the original question where no pre-processing is done the results will be the same when using the `finalModel`

, the manually fitted `randomForest`

object or the `caret.train`

object.

```
all.equal(current = as.vector(predict(fitRFcaret$finalModel, testData)),
target = as.vector(predict(fitRFmanual, testData)))
# TRUE
all.equal(current = as.vector(predict(fitRFcaret$finalModel, testData)),
target = as.vector(predict(fitRFcaret, testData)))
# TRUE
```

I would strongly suggest that you *always* set the random seed used by R, MATLAB or any other program used. Otherwise, you cannot check the reproducibility of results (which OK, it might not be the end of the world) nor exclude a bug or external factor affecting the performance of a modelling procedure (which yeah, it kind of sucks). A lot of the leading ML algorithms (eg. gradient boosting, random forests, extreme neural networks) do employ certain internal resampling procedures during their training phases, setting the random seed states prior (or sometimes even within) their training phase can be important.

Thanks @charles for pointing me to "positive". Though positive = 1 did not work as the argument positive takes only character value in the function. But I was able to get what I wanted using the following:

```
levels(test_td$UpSell_Ind)
[1] "0" "1"
confusionMatrix(data = predict_glm_vif_test, test_td$UpSell_Ind, positive = levels(test_td$UpSell_Ind)[2])
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 8104 3241
1 89 289
Accuracy : 0.7159
95% CI : (0.7077, 0.7241)
No Information Rate : 0.6989
P-Value [Acc > NIR] : 2.701e-05
Kappa : 0.0952
Mcnemar's Test P-Value : < 2.2e-16
Sensitivity : 0.08187
Specificity : 0.98914
Pos Pred Value : 0.76455
Neg Pred Value : 0.71432
Prevalence : 0.30112
Detection Rate : 0.02465
Detection Prevalence : 0.03224
Balanced Accuracy : 0.53550
'Positive' Class : 1
```

## Best Answer

Most classification models in R produce both a class prediction and the probabilities for each class. For binary data, in almost every case, the class prediction is based on a 50% probability cutoff.

`glm`

is the same. With`caret`

, using`predict(object, newdata)`

gives you the predicted class and`predict(object, new data, type = "prob")`

will give you class-specific probabilities (when`object`

is generated by`train`

).You can do things differently by defining your own model and applying whatever cutoff that you want. The

`caret`

website also has an example that uses resampling to optimize the probability cutoff.tl;dr`confusionMatrix`

uses the predicted classes and thus a 50% probability cutoffMax