Solved – Adaboost – update of weights

adaboostboostingself-study

i am self-studying AdaBoost – and reading the following useful article. http://www.inf.fu-berlin.de/inst/ag-ki/adaboost4.pdf . I am trying to understand, as per below, the following questions:

1) When we select and extract from the pool of classifiers, do we extract from a given pool of classifiers (e.g. an existing pool of the first 100 trees), or do we (as i presume) create the optimal classifier from scratch (e.g. a new tree with different splitting variables)?

2) i am failing to see step 3 (the update of weights) – why do we know that the new weights are the old weights multipled with $e^{a_m}$ in case of a hit?

enter image description here

Best Answer

For 1), yes on both counts. You can view training a new classifier as selecting the best classifier from the "pool" defined as the range (i.e. the collection of all possible resultant classifiers) of the classification algorithm.

For 2), this re-weighting scheme is simply part of the definition of the adaboost algorithm. A reasonable question is, of course, why this choice? Reweighing in this way allows one to bound the training error with an exponentially decreasing function. Here is Theorem 3.1 from Boosting by Schapire and Freund:

Given the notation of algorithm 1.1 (adaboost) let $\lambda_t = \frac{1}{2} - e_t$, and let $D_1$ be any initial distribution over the training set. Then the weighted training error of the combined classifier $H$ with respect to $D_1$ is bounded as

$$ Pr( H(x_i) \neq y_i) \leq \exp \left( -2 \sum_t \lambda_t^2 \right) $$

You can use this to show that, if your base (weak) classifiers have a fixed edge over being random (i.e. a small bias to being correct, no matter how small), then adaboost drives down the training error exponentially fast. The proof of this inequality uses the relation (3) in a fundamental way.

I should note, there is nothing obvious about the algorithm. I'm sure it took years and years of meditation and pots and pots of coffee to come into its final form - so there is nothing wrong with an initial ??? response to the setup.