Random Forest Classifier – max_depth vs max_leaf_nodes in Scikit-Learn Explained

random forestscikit learn

What's the difference, if any at all, between max_depth and max_leaf_nodes in sklearn's RandomForestClassifier for a simple binary classification problem?

If the model always grows trees in a symetric fashion, one would assume setting max_depth = 5 is equivalent to setting max_leaf_nodes = 32.

The fact that sklearn gives us 2 options suggests that might not be the case.

Best Answer

As @whuber points out in a comment, a 32-leaf tree may have depth larger than 5 (up to 32). To answer your followup question, yes, when max_leaf_nodes is set, sklearn builds the tree in a best-first fashion rather than a depth-first fashion.

From the docs (emphasis added):

max_leaf_nodes : int, default=None

Grow trees with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

and in the source code:

        # snipped from much earlier, line 231 in the permalink above:
        max_leaf_nodes = -1 if self.max_leaf_nodes is None else self.max_leaf_nodes
        ...
        # Use BestFirst if max_leaf_nodes given; use DepthFirst otherwise
        if max_leaf_nodes < 0:
            builder = DepthFirstTreeBuilder(
                 ...
            )
        else:
            builder = BestFirstTreeBuilder(
                ...
            )