Solved – Adaboost with a weak versus a strong learner

adaboostmachine learning

Question 1) Why does Adaboost with a weak learner (decision stumps) perform better than a strong learner (decision tree). I implemented codes for these myself, but the accuracies were so close that the significance of the comparison was not convincing. Question 2) Why is it that Adaboost accuracy depends on the number of rounds?

Best Answer

There can be many explanations to this questions. I would elaborate some important aspects only. Adaboost belongs to class of boosting algorithms.

The core principle of AdaBoost is to fit a sequence of weak learners (i.e., models that are only slightly better than random guessing, such as small decision trees) on repeatedly modified versions of the data. The predictions from all of them are then combined through a weighted majority vote (or sum) to produce the final prediction. The data modifications at each so-called boosting iteration consist of applying weights w1, w2, ..., wN to each of the training samples. Initially, those weights are all set to wi = 1/N, so that the first step simply trains a weak learner on the original data. For each successive iteration, the sample weights are individually modified and the learning algorithm is reapplied to the reweighted data. At a given step, those training examples that were incorrectly predicted by the boosted model induced at the previous step have their weights increased, whereas the weights are decreased for those that were predicted correctly. As iterations proceed, examples that are difficult to predict receive ever-increasing influence. Each subsequent weak learner is thereby forced to concentrate on the examples that are missed by the previous ones in the sequence.

Because of the above procedure, the boosted algorithms can learn so well that it beats the decision trees or ensembles of decision trees like RandomForest in almost all the cases. The best example to prove the above fact is Xgboost which has totally replaced the use of RandomForest at Kaggle.

But boosted algorithms has a downside too. You have to train a lot of parameters as in case of Adaboost or Xgboost as compared to DecisionTrees or RandomForest.

For further readings, you can refer to the following links:

http://fastml.com/what-is-better-gradient-boosted-trees-or-random-forest/

https://www.quora.com/How-do-random-forests-and-boosted-decision-trees-compare