Solved – Are decision trees sensitive to log translations in feature space

cartginimachine learningrpart

This question was partially answered on Are decision trees sensitive to translations in feature space?, but no references were provided for "Gini impurity and entropy measures are translation invariant".
I couldn't find material relating to this topic, so does anyone know either:
a) Why Gini is translation invariant, or
b) Why the results of a decision tree would be otherwise insensitive to translations (e.g. log) in the feature space?
Thanks.

Best Answer

It depends on what algorithm is being used to build the tree. CART trees are invariant to scale changes so a log transform should not change the resulting tree. However, the values of the split rules will be changed to the log scale.

The reason for this is because the splitting process sorts each feature (numeric) and then checks midpoints between successive observations for impurity improvement based on splits at the interval point. The maximum across observations and features is chosen for that node and the process continues. This means that if you rescale any feature(s) as long as the relative ordering of feature values is maintained-which a log transform will maintain-the tree will be the same but the split values will be log-transformed.

Best Answer

Related Solutions

Cart Likelihood-Ratio Information-Theory – Understanding the Relationship Between GINI Score and Log-Likelihood Ratio

Solved – When is classification error rate preferable when pruning decision trees

Related Question