Solved – Random Forest and Decision Tree Algorithm

baggingcartmachine learningrandom forest

A random forest is a collection of decision trees following the bagging concept. When we move from one decision tree to the next decision tree then how does the information learned by last decision tree move forward to the next?

Because, as per my understanding, there is nothing like a trained model which gets created for every decision tree and then loaded before the next decision tree starts learning from the misclassified error.

So how does it work?

Best Answer

No information is passed between trees. In a random forest, all of the trees are identically distributed, because trees are grown using the same randomization strategy for all trees. First, take a bootstrap sample of the data, and then grow the tree using splits from a randomly-chosen subset of features. This happens for each tree individually without attention to any other trees in the ensemble. However, the trees are correlated purely by virtue of each tree being trained on a sample from a common pool of training data; multiple samples from the same data set will tend to be similar, so the trees will encode some of that similarity.

You might find it helpful to read an introduction to random forests from a high-quality text. One is "Random Forests" by Leo Breiman. There's also a chapter in Elements of Statistical Learning by Hastie et al.

It's possible that you've confused random forests with boosting methods such as AdaBoost or gradient-boosted trees. Boosting methods are not the same, because they use information about misfit from previous boosting rounds to inform the next boosting round. See: Is random forest a boosting algorithm?

Related Question