Machine Learning – Variable Scaling and Normalization Requirements for Decision Trees

cartfeature selectionmachine learning

In many machine learning algorithms, feature scaling (aka variable scaling, normalization) is a common prepocessing step Wikipedia – Feature Scaling — this question was close Question#41704 – How and why do normalization and feature scaling work?

I have two questions specifically in regards to Decision Trees:

  1. Are there any decision tree implementations that would require feature scaling? I am under the impression that most algorithms' splitting criteria are indifferent to scale.
  2. Consider these variables: (1) Units, (2) Hours, (3) Units per Hour — is it best to leave these three variables "as-is" when fed into a decision tree or do we run into some type of conflict since the "normalized" variable (3) is relatable to (1) and (2)? That is, would you attack this situation by throwing all three variables into the mix, or would you typically choose some combination of the three or simply use the "normalized/standardized" feature (3)?

Best Answer

For 1, decision trees in in general don't usually require scaling. However, it helps with data visualization/manipulation, and might be useful if you intend to to compare performance with other data or other methods like SVM.

For 2, this is a question of tuning. Units/hour might be considered a type of variable interaction and may have predictive power different from each alone. This really depends on your data, though. I'd try with and without to see if there is a difference.

Related Question