Solved – What parameter of GBM does gradient descent update after calculating gradient of loss function

boostingcartgradient descentmachine learningoptimization

I am going through Elements of statistical Learning and trying to understand GBM algorithm.

The algorithm of GBM is shown below.
enter image description here

I understand general gradient descent algorithm mentioned below very well.

enter image description here

Questions

  1. Which parameter (theta j in the above picture) of GBM is gradient descent updating using each new tree that is added to GBM? Can you explain the above GBM algorithm intuitive in this context?
  2. What is the gamma in the GBM algorithm and intuition behind it?
  3. Seems gamma is calculated for each terminal region per each tree. What does it mean/do?
  4. GBM does not use reweighing of training samples unlike Adaboost which does. True or False?

Best Answer

  1. GBM is gradient descent in the function space rather than the parameter space. GBM uses gradient descent to calculate the iteration residuals for tree construction. The residuals can be thought of as the step direction.
  2. The gamma is computing the gradient descent step which is the terminal node predictions for that iteration. This can be thought of as the step size.
  3. You are correct about gamma being computed for every terminal region. This is only because the base-learner in GBM is a decision tree.
  4. True, although GBM supports the exponential loss function which Friedman proved to be equivalent to adaboost.
Related Question