Gradient Boosting vs Adaboost – Intuitive Explanations of Differences

adaboostboosting

I'm trying to understand the differences between GBM & Adaboost.

These are what I've understood so far:

  • There are both boosting algorithms, which learns from previous model's errors and finally make a weighted sum of the models.
  • GBM and Adaboost are pretty similar except for their loss functions.

But still it is difficult for me to grab an idea of differences between them.
Can someone give me intuitive explanations?

Best Answer

I found this introduction which provides some intuitive explanations:

  • In Gradient Boosting, ‘shortcomings’ (of existing weak learners) are identified by gradients.
  • In AdaBoost, ‘shortcomings’ are identified by high-weight data points.

By means of an exponential loss function, AdaBoost gives more weights to those samples fitted worse in previous steps. Today, AdaBoost is regarded as a special case of Gradient Boosting in terms of loss function. Historically it preceded Gradient Boosting to which it was later generalized, as shown in the history provided in the introduction:

  1. Invent AdaBoost, the first successful boosting algorithm [Freund et al., 1996, Freund and Schapire, 1997]
  2. Formulate AdaBoost as gradient descent with a special loss function [Breiman et al., 1998, Breiman, 1999]
  3. Generalize AdaBoost to Gradient Boosting in order to handle a variety of loss functions [Friedman et al., 2000, Friedman, 2001]
Related Question