Solved – Does party package in R provide out-of-bag estimates of error for Random Forest models

machine learningrrandom forest

I'm a new R user, and also new to Random Forest modeling. I cannot seem to figure out how to obtain the out-of-bag (OOB) error estimates for cforest models built with the Party Package in R. In the randomForest package, the OOB error estimates are displayed if you simply "print" the model object, but the party package doesn't work the same way.

Run random forest model using randomForest package:

> SBrf<- randomForest(formula = factor(SB_Pres) ~ SST + Chla + Dist2Shr + DaylightHours + Bathy + Slope + MoonPhase + factor(Region), data = SBrfImpute, ntree = 500, replace = FALSE, importance = TRUE)
> print(SBrf)

Call:
 randomForest(formula = factor(SB_Pres) ~ SST + Chla + Dist2Shr + DaylightHours + Bathy + Slope + MoonPhase + factor(Region),      data = SBrfImpute, ntree = 500, replace = FALSE, importance = TRUE) 
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 2

        OOB estimate of  error rate: 23.67%
Confusion matrix:
    0   1 class.error
0 823 127   0.1336842
1 211 267   0.4414226

Run random forest model using party package:

> SBcf<- cforest(formula = factor(SB_Pres) ~ SST + Chla + Dist2Shr+ DaylightHours + Bathy + Slope + MoonPhase + factor(Region), data = bll_SB_noNA, control = cforest_unbiased())
> print(SBcf)

Random Forest using Conditional Inference Trees
Number of trees:  500 

Response:  factor(SB_Pres) 
Inputs:  SST, Chla, Dist2Shr, DaylightHours, Bathy, Slope, MoonPhase, factor(Region) 
Number of observations:  534 

I've read through the manuals and vignettes but can't seem to find an answer. Does anyone know how to retrieve the OOB error estimates once you have run a random forest model using the party package? Or am I completely missing some very important difference between the two packages that results in no OOB error estimates for random forest models built with the party package?

Best Answer

The caret package has a method for getting that. You can use train as the interface. For example:

> mod1 <- train(Species ~ ., 
+               data = iris, 
+               method = "cforest", 
+               tuneGrid = data.frame(.mtry = 2),
+               trControl = trainControl(method = "oob"))
> mod1
150 samples
  4 predictors
  3 classes: 'setosa', 'versicolor', 'virginica' 

No pre-processing
Resampling: 

Summary of sample sizes:  

Resampling results

  Accuracy  Kappa
  0.967     0.95 

Tuning parameter 'mtry' was held constant at a value of 2

Alternatively, there is an internal function that can be used if you want to go straight to cforest but you have to call it using the namespace operator:

> mod2 <- cforest(Species ~ ., data = iris,
+                 controls = cforest_unbiased(mtry = 2))
> caret:::cforestStats(mod2)
 Accuracy     Kappa 
0.9666667 0.9500000 

HTH,

Max