Solved – Why don’t we use regularization on decision tree split

overfittingrandom forestregularization

I heard people ask which one is better: Linear regression with regularization or Random Forest. My question is why can't you use regularization with Random Forest?

My understanding is that different regularization technique is adding a term to cost functions such as cross-entropy to reduce accuracy/overfitting to training data.

Typically, preventing overfitting technique for decision trees is associated with using Random Forest. I have never heard people associate regularization with decision trees or Random Forests.

Is it that reducing overfitting at the individual node level is too complex?

Edit: I acknowledge that RF is already a regularization method in a sense for decision trees through bootstrapping and aggregating. I think RF can be imaged as trying to estimate the answer from aggregating answers from crowd versus an expert(aka wisdom of the crowd).
I am not asking what regularization for RF is, I am asking why can't we simply also apply regulation at a single node step of the decision tree, since it is just adding bias to the cross-entropy, which is the cost function for linear regression or the decision used to split a tree. Is it because there is no point doing that since we don't use cross-entropy to update values of the feature?

Best Answer

Random forest has regularization, it's just not in the form of a penalty to the cost function. Random forest doesn't have a global cost function in the same sense of linear regression; it's just greedily maximizing information gain at each split. Limiting child node size, minimum information gain and so on all change how the trees are constructed and impose regularization on the model in the sense that a proposed split must be "large enough".

Related Question