Adaboost – Why Choose Adaboost with Decision Trees?

algorithmsboostingclassificationmachine learning

I've been reading a bit on boosting algorithms for classification tasks and Adaboost in particular. I understand that the purpose of Adaboost is to take several "weak learners" and, through a set of iterations on training data, push classifiers to learn to predict classes that the model(s) repeatedly make mistakes on. However, I was wondering why so many of the readings I've done have used decision trees as the weak classifier. Is there a particular reason for this? Are there certain classifiers that make particularly good or bad candidates for Adaboost?

Best Answer

I talked about this in an answer to a related SO question. Decision trees are just generally a very good fit for boosting, much more so than other algorithms. The bullet point/ summary version is this:

  1. Decision trees are non-linear. Boosting with linear models simply doesn't work well.
  2. The weak learner needs to be consistently better than random guessing. You don't normal need to do any parameter tuning to a decision tree to get that behavior. Training an SVM really does need a parameter search. Since the data is re-weighted on each iteration, you likely need to do another parameter search on each iteration. So you are increasing the amount of work you have to do by a large margin.
  3. Decision trees are reasonably fast to train. Since we are going to be building 100s or 1000s of them, thats a good property. They are also fast to classify, which is again important when you need 100s or 1000s to run before you can output your decision.
  4. By changing the depth you have a simple and easy control over the bias/variance trade off, knowing that boosting can reduce bias but also significantly reduces variance. Boosting is known to overfit, so the easy nob to tune is helpful in that regard.