Solved – Gradient Boosting Tree vs Random Forest

boostingcartensemble learningmachine learningrandom forest

Gradient tree boosting as proposed by Friedman uses decision trees as base learners. I'm wondering if we should make the base decision tree as complex as possible (fully grown) or simpler? Is there any explanation for the choice?

Random Forest is another ensemble method using decision trees as base learners.
Based on my understanding, we generally use the almost fully grown decision trees in each iteration. Am I right?

Best Answer

$\text{error = bias + variance}$

  • Boosting is based on weak learners (high bias, low variance). In terms of decision trees, weak learners are shallow trees, sometimes even as small as decision stumps (trees with two leaves). Boosting reduces error mainly by reducing bias (and also to some extent variance, by aggregating the output from many models).
  • On the other hand, Random Forest uses as you said fully grown decision trees (low bias, high variance). It tackles the error reduction task in the opposite way: by reducing variance. The trees are made uncorrelated to maximize the decrease in variance, but the algorithm cannot reduce bias (which is slightly higher than the bias of an individual tree in the forest). Hence the need for large, unpruned trees, so that the bias is initially as low as possible.

Please note that unlike Boosting (which is sequential), RF grows trees in parallel. The term iterative that you used is thus inappropriate.

Related Question