Solved – Is exponential loss function the only reason for AdaBoost being adaptive algorithm

adaboostboostingensemble learningloss-functions

Main concept of AdaBoost is that on each iteration algorithm learns what samples were difficult to classify and increases weights of these samples, while decreasing weights of those that were easy to classify. That's where the name adaptive boosting comes from.

My question: What makes AdaBoost algorithm so different comparing with general Gradient Boosting. Is exponential loss the only reason that leads to that "adaptive" algorithm concept?

Best Answer

The main difference between AdaBoost and other "generic" boosting algorithms is that AdaBoost uses the (deviance) residuals as weights while "generic" gradient boosting algorithms use the residuals as the learning target itself.

This different between AdaBoost and other "generic" Gradient Boosting Machine (GBM) methodologies is more prominent when we examine a "generic" GBM as an additive model where we find the solution iteratively via the Backfitting algorithm (one can see Elements of Statistical Learning, Hastie et al. (2009) Ch. 10.2 "Boosting Fits an Additive Model" for the relation between boosting and additive models in more detail). To that extent you can look into LogitBoost as it effectively is the one that bridges AdaBoost and the "generic" GBM framework. The "LogitBoost paper" (Additive logistic regression: a statistical view of boosting, Friedman et al. (2000)) has a specific section (Sect. 4, AdaBoost: an additive logistic regression model) that focuses on the interpretation of AdaBoost as a stage-wise estimation procedure for fitting an additive logistic regression model. It focuses on how AdaBoost minimises the expectation of the exponential loss $E\{ e^{-y F(x)} \}$ via an iterative approach.