Solved – Tree size in gradient tree boosting

boostingcartr

Gradient tree boosting as proposed by Friedman uses decision trees with J terminal nodes (=leaves) as base learners. There are a number of ways to grow a tree with exactly J nodes for example one can grow the tree in a depth first fashion or in a breadth first fashion, …

Is there an established way how to grow trees with exactly J terminal nodes for gradient tree boosting?

I examined the tree growing procedure of R's gbm package and it seems that it expands the tree in depth-first fashion and uses a heuristic based on error improvement to choose whether to expand the left or the right child node — is that correct?

Best Answer

The solution in R's gbm is not a typical one.

Other packages, like scikit-learn or LightGBM use so-called (in scikit-learn) BestFirstTreeBuilder, when the number of leaves is restricted. It supports a priority queue of all the leaves and at each iteration splits the leaf that brings the best impurity decrease. So it is neither depth-first nor breadth-first, but a third algorithm, based on calculations in the leaves.

In some sense, this approach is more optimal than blindly split all the leaves in turn. However, it is still a greedy heuristic, because the choice whether to split the $i$'th node now depends only on the first split of $i$ and not the possible succesive splits that may decrease impurity much more than the current split.