Solved – the loss/cost function of decision trees

boostingcartensemble learning

In Decision Tree, splitting criterion methods are applied say information gain to split the current tree node to built a decision tree, but in many machine learning problems, normally there is a cost/loss function to be minimised to get the best parameters.

My question is how to define such a cost function of Decision Tree?

Best Answer

I think it helps to distinguish between training metrics and evaluation metrics, and between global training metrics and local training metrics. When we talk about evaluation metrics, as @AlvaroFuentes said, a loss function can always be defined for decision trees, in the same way as for any other model. In training, it is true that often a global metric is chosen and training attempts to optimize over that metric *. But training does not have to be this way, and in the case of decision trees, training proceeds through a greedy search, each step based on a local metric (eg, information gain or Gini index). In fact, even when a global training metric is defined (say, likelihood), each step in training is still is evaluated based on some local metric (say, gradient of likelihood) and hence in a sense "greedy"; it is just that in this case the local metric is inspired by the global one. And in both cases there is no guarantee that a greedy search based on some local metric actually optimizes the global metric (eg, local optima).

* Side note: This training metric is often different from the evaluation metrics, chosen for its nicer mathematical properties to help the training; eg, likelihood, L2 or cross entropy vs accuracy or AUC.

Related Question