Solved – Conditional Inference Random Forest

partyrrandom forest

I use cforest, a function of the R package Party, to realize a conditional inference random forest. However I don't understand how this function compute the predict variable for a regression problem.

Could you explain me how this work please ?
Can I get in output the median of the mean final tree nodes ?
Can I get the more significant explanatory variables with this function ?

Best Answer

The cforest function constructs a forest of conditional inference trees, see help("cforest", package = "party") for further details and references. In short, the conditional inference trees (Hothorn et al. 2006a) are grown "in the usual way" on bootstrap samples or subsamples with only a subset of variables available for splitting in each node. For predictions a suitably weighted mean of the observed responses is constructed (Hothorn et al. 2006b). You could also use the forest to get other types of aggregations such as medians or other quantiles. However, this is not provided by default.

While conditional inference trees employ significance tests for determining the split variables and split points, there are no classical significances for the explanatory variables. However, various flavors of variable importance measures are available (Strobl et al. 2007, 2008).

References:

  • Torsten Hothorn, Kurt Hornik, Achim Zeileis (2006a). Unbiased Recursive Partitioning: A Conditional Inference Framework. Journal of Computational and Graphical Statistics, 15(3), 651-674.

  • Torsten Hothorn, Peter Bühlmann, Sandrine Dudoit, Annette Molinaro, Mark Van Der Laan (2006b). Survival Ensembles. Biostatistics, 7(3), 355-373.

  • Carolin Strobl, Anne-Laure Boulesteix, Achim Zeileis, Torsten Hothorn (2007). Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution. BMC Bioinformatics, 8(25).

  • Carolin Strobl, Anne-Laure Boulesteix, Thomas Kneib, Thomas Augustin, Achim Zeileis (2008). Conditional Variable Importance for Random Forests. BMC Bioinformatics, 9(307).

Related Question