Solved – Bagging of xgboost

baggingboostinggradientmachine learning

The extreme-gradient boosting algorithm seems to be widely applied these days. I often have the feeling that boosted models tend to overfit. I know that there are parameters in the algorithm to prevent this. Sticking to the documentation here the parameters subsample and colsample_bytree could (among others) prevent overfitting. But they do not serve for the same purpose as bagging xgboosted models would – right?

My question: would you apply bagging on top of xgboost to reduce the variance of the fit?

So far the question is statistical and I dare to add a code detail: in case bagging makes sense I would be happy about example code using the R package caret.

EDIT after the remark: if we rely on the parameters only to control the overfit, then how can we design the cross-validation best? I have approx. 6000 data points and apply 5-fold x-validation. What could improve the out-of-sample performance: going to something like 10-fold x-validation or doing repeated 5-fold x-validation? Just to mention: I use the package cartet where such strategies are implemented.

Best Answer

The bag in bagging is about aggregation. If you have k CART models then for an input you get k candidate answers. How do you reduce that to a single value. The aggregation does that. It is often a measure of central tendency like the mean or mode.

In order to aggregate, you would need multiple outputs. The gradient boosted machine (gbm) as in XGboost, is a series ensemble, not parallel one. This means that it lines them all up in a bucket brigade, and all the learners (but front and back) take the output of one, and give it to the next one). The final output is the same structure as a CART model - a single output. There is no bootstrapping to be done on a single element.