Solved – Calibrated boosted decision trees in R or MATLAB

classificationMATLABr

In An Empirical Comparison of Supervised Learning Algorithms (ICML 2006) the authors (Rich Caruana and Alexandru Niculescu-Mizil) evaluated several classification algorithms (SVMs, ANN, KNN, Random Forests, Decision Trees, etc.), and reported that calibrated boosted trees ranked as the best learning algorithm overall across eight different metrics (F-score, ROC Area, average precision, cross-entropy, etc.).

I would like to test calibrated boosted decision trees in one of my projects, and was wondering if anybody could suggest a good R package or MATLAB library for this.

I am relatively new to R, although I have large experience with MATLAB and Python. I have read about R's gbm, tree, and rpart but I am not sure if these packages implement calibrated boosted decision trees or if there are others that implement them.

Thanks

Best Answer

About R, I would vote for the gbm package; there's a vignette that provides a good overview: Generalized Boosted Models: A guide to the gbm package. If you are looking for an unified interface to ML algorithms, I recommend the caret package which has built-in facilities for data preprocessing, resampling, and comparative assessment of model performance. Other packages for boosted trees are reported under Table 1 of one of its accompanying vignettes, Model tuning, prediction and performance functions. There is also an example of parameters tuning for boosted trees in the JSS paper, pp. 10-11.

Note: I didn't check, but you can also look into Weka (there's an R interface, RWeka).