Solved – setting max_features to none in random forest

cartmachine learningrandom forestscikit learn

For sklearn's Random forest classification module, setting max_features to none takes into consideration all the n features for building each tree. In this case, how is it different from applying bagging to simple CART. Also isn't feature sampling the USP of random forest.

Best Answer

In bagging, the only parameter we tune is the number of trees. In Random Forest, we tune both the number of trees and the number of input variables $m \leq p$ considered for splitting each node. If $m = p$, we are bagging. Typically we choose $m \ll p$.

I assume by "USP" you mean "unique selling proposition?" If so, yes, the ability to consider a subset of features at each node is the main feature differentiating Random Forest from Bagging. The fact that one could set $m=p$ doesn't mean one should, and it doesn't invalidate the random feature subsetting as a "selling point" of Random Forest.

One typically uses $\sqrt{p}$ features, at least for classification problems.