Solved – Does Xgboost use feature subsetting

boosting

Random forests select a subset of the features at each node and only considers those features as candidate splits. Wikipedia says this is sometimes called feature bagging.

Does XGBoost also use this technique, in its tree learning, when growing an individual tree? Or does it consider all possible features when examining possible splits?

I read the original paper on XGBoost [1], and couldn't tell from what was in that paper. Section 2.3 of that paper mentions "column sub-sampling" , which appears to be the same thing, and it mentions columns sub-sampling is used in random forests, but doesn't explicitly indicate whether it is used in XGBoost.

References:
[1] XGBoost: A Scalable Tree Boosting System. Tianqi Chen, Carlos Guestrin. KDD'16, arXiv:1603.02754.

Best Answer

See documentaiton here, and search colsample_bytree.

colsample_bytree [default=1]

subsample ratio of columns when constructing each tree.

range: (0,1]

By default, it uses all features. But you can set colsample_bytree parameter to get a subset on features.