Solved – randomForest vs. cforest; Can I get partial dependence plots and percent variance explained in package party

rrandom forest

I have a data set with 24 predictor variables, all continuous, but with different scales and potential collinearity. I’m trying to decide whether to use randomForest or cforest in party with conditional importance permutation.

I recognize that I should probably use cforest if I want to overcome variable selection bias, but I find the ability to get partial dependence plots and percent variance explained from the randomForest package to be quite appealing.

I was wondering if anyone knew if it were possible to get partial dependence plots and percent variance explained from cforest?

Also, it appears that ctree uses a significance test to select variables; is this the same for cforest? And how might I get these significance values for each variable in cforest?

Best Answer

my package edarf will calculate partial dependence for predictors using cforest. you can get permutation using the varimp function in the party package as well.

yes cforest generates an ensemble of trees of the same form as ctree with random features selected at each node and subsampling (by default). control the parameters of via cforest_control. if you download the source from the cran page you can see all the relevant code, most of which is written in C but is fairly readable.