Is Random Forest a Boosting Algorithm? – Detailed Explanation

baggingboostingmachine learningrandom forest

Short definition of boosting:

Can a set of weak learners create a single strong learner? A weak
learner is defined to be a classifier which is only slightly
correlated with the true classification (it can label examples better
than random guessing).

Short definition of Random Forest:

Random Forests grows many classification trees. To classify a new
object from an input vector, put the input vector down each of the
trees in the forest. Each tree gives a classification, and we say the
tree "votes" for that class. The forest chooses the classification
having the most votes (over all the trees in the forest).

Another short definition of Random Forest:

A random forest is a meta estimator that fits a number of decision
tree classifiers on various sub-samples of the dataset and use
averaging to improve the predictive accuracy and control over-fitting.

As I understand Random Forest is an boosting algorithm which uses trees as its weak classifiers. I know that it also uses other techniques and improves upon them. Somebody corrected me that Random Forest is not a boosting algorithm?

Can someone elaborate upon this, why Random Forest is not a boosting algorithm?

Best Answer

Random Forest is a bagging algorithm rather than a boosting algorithm. They are two opposite way to achieve a low error.

We know that error can be composited from bias and variance. A too complex model has low bias but large variance, while a too simple model has low variance but large bias, both leading a high error but two different reasons. As a result, two different ways to solve the problem come into people's mind (maybe Breiman and others), variance reduction for a complex model, or bias reduction for a simple model, which refers to random forest and boosting.

Random forest reduces variance of a large number of "complex" models with low bias. We can see the composition elements are not "weak" models but too complex models. If you read about the algorithm, the underlying trees are planted "somewhat" as large as "possible". The underlying trees are independent parallel models. And additional random variable selection is introduced into them to make them even more independent, which makes it perform better than ordinary bagging and entitle the name "random".

While boosting reduces bias of a large number of "small" models with low variance. They are "weak" models as you quoted. The underlying elements are somehow like a "chain" or "nested" iterative model about the bias of each level. So they are not independent parallel models but each model is built based on all the former small models by weighting. That is so-called "boosting" from one by one.

Breiman's papers and books discuss about trees, random forest and boosting quite a lot. It helps you to understand the principle behind the algorithm.