Question 1) Why does Adaboost with a weak learner (decision stumps) perform better than a strong learner (decision tree). I implemented codes for these myself, but the accuracies were so close that the significance of the comparison was not convincing. Question 2) Why is it that Adaboost accuracy depends on the number of rounds?
Solved – Adaboost with a weak versus a strong learner
adaboostmachine learning
Related Solutions
So, boosting is a learning algorithm, which can generate high-accuracy predictions using as a subroutine another algorithm, which in turn can efficiently generate hypotheses just slightly better (by an inverse polynomial) than random guessing.
It's main advantage is speed.
When Schapire presented it in 1990 it was a breakthrough in that it showed that a polynomial time learner generating hypotheses with errors just slightly smaller than 1/2 can be transformed into a polynomial time learner generating hypotheses with an arbitrarily small error.
So, the theory to back up your question is in "The strength of weak learnability" (pdf) where he basically showed that the "strong" and "weak" learning are equivalent.
And perhaps the answer the the original question is, "there's no point constructing strong learners when you can construct weak ones more cheaply".
From the relatively recent papers, there's "On the equivalence of weak learnability and linear separability: new relaxations and efficient boosting algorithms" (pdf) which I don't understand but which seems related and may be of interest to more educated people :)
Random Forest is a bagging algorithm rather than a boosting algorithm. They are two opposite way to achieve a low error.
We know that error can be composited from bias and variance. A too complex model has low bias but large variance, while a too simple model has low variance but large bias, both leading a high error but two different reasons. As a result, two different ways to solve the problem come into people's mind (maybe Breiman and others), variance reduction for a complex model, or bias reduction for a simple model, which refers to random forest and boosting.
Random forest reduces variance of a large number of "complex" models with low bias. We can see the composition elements are not "weak" models but too complex models. If you read about the algorithm, the underlying trees are planted "somewhat" as large as "possible". The underlying trees are independent parallel models. And additional random variable selection is introduced into them to make them even more independent, which makes it perform better than ordinary bagging and entitle the name "random".
While boosting reduces bias of a large number of "small" models with low variance. They are "weak" models as you quoted. The underlying elements are somehow like a "chain" or "nested" iterative model about the bias of each level. So they are not independent parallel models but each model is built based on all the former small models by weighting. That is so-called "boosting" from one by one.
Breiman's papers and books discuss about trees, random forest and boosting quite a lot. It helps you to understand the principle behind the algorithm.
Best Answer
There can be many explanations to this questions. I would elaborate some important aspects only. Adaboost belongs to class of boosting algorithms.
The core principle of AdaBoost is to fit a sequence of weak learners (i.e., models that are only slightly better than random guessing, such as small decision trees) on repeatedly modified versions of the data. The predictions from all of them are then combined through a weighted majority vote (or sum) to produce the final prediction. The data modifications at each so-called boosting iteration consist of applying weights
w1, w2, ..., wN
to each of the training samples. Initially, those weights are all set towi = 1/N
, so that the first step simply trains a weak learner on the original data. For each successive iteration, the sample weights are individually modified and the learning algorithm is reapplied to the reweighted data. At a given step, those training examples that were incorrectly predicted by the boosted model induced at the previous step have their weights increased, whereas the weights are decreased for those that were predicted correctly. As iterations proceed, examples that are difficult to predict receive ever-increasing influence. Each subsequent weak learner is thereby forced to concentrate on the examples that are missed by the previous ones in the sequence.Because of the above procedure, the boosted algorithms can learn so well that it beats the decision trees or ensembles of decision trees like RandomForest in almost all the cases. The best example to prove the above fact is
Xgboost
which has totally replaced the use ofRandomForest
at Kaggle.But boosted algorithms has a downside too. You have to train a lot of parameters as in case of Adaboost or Xgboost as compared to DecisionTrees or RandomForest.
For further readings, you can refer to the following links:
http://fastml.com/what-is-better-gradient-boosted-trees-or-random-forest/
https://www.quora.com/How-do-random-forests-and-boosted-decision-trees-compare