Solved – How Gradient Descent is used for classification with Decision Trees

boostinggradientgradient descent

I'm not able to see how do we use gradient descent to minimize the loss of binary classification with decision tree.

What I understood is that we first have a model (decision tree) that try to predict our y values. Our model make classification errors therefore we will use a new decision tree on our errors (observations where make wrong classifications ?) to correct our model, and we will add our new decision tree to the previous. Then we check for classification error for our newly combined model and repeat the process until we have almost no error.

Then, when do we use gradient descent and how is it used to tune our new decision trees ? When we build a new tree on our errors does it mean the observations where we made an error ?

I'm sorry if i'm not clear, i'm still a bit confused at how it works exactly.

Thank you in advance for your help.

Best Answer

Gradient descent is not used for training decision trees. Not every machine learning algorithm uses a general optimization algorithm (e.g. gradient descent) for training, some of them use specialized algorithms for training them. Examples of such algorithms are $k$-NN, naive Bayes, or decision trees, in case of those algorithms we don't train them by directly minimizing some loss function to find best set of parameters (in fact, $k$-NN, or decision tree don't have parameters per se), but have their own algorithms for finding the solutions.

As a side note, gradient descent is even not always used for the algorithms that do train by directly minimizing loss function. It is only one of the many optimization algorithms. Moreover, it is not even the most efficient algorithm, as there are many algorithms that work better for some problems. Gradient descent got popular mostly because it is easy and efficient to use for training neural networks, but that does not make it "one size fits all" algorithm.

Best Answer

Related Solutions

Solved – How to update weights in a neural network using gradient descent with mini-batches

Solved – Decision trees, Gradient boosting and normality of predictors

Related Question