Normalization – Why Scaling/Normalization is Not Needed for Tree-Based Models

boostingmultidimensional scalingnormalizationrandom forest

I could not find a good answer/reference that can explain why rf/decision trees/gbm are not susceptible to the scale of values of numerical variables.

My sense is that since boosting methods penalize more if the error is large so they should certainly be susceptible to scale of the feature variables.

I have a dataset between 0-100 and some values an order of magnitude larger, in the range of 1000's. Should i scale them?

Based on your experience, does it help to scale features in tree based algos?

Best Answer

If you are scaling the outcome variable, all you are doing is multiplying everything by a constant and/or adding a constant. So, any effect that it has is irrelevant (i.e., it does not change the relativities of anything).

In the case of the predictors, the scale of the predictor variables is not a determinant of the predictions in any way with a traditional tree-based model. For example, consider the following simple example with 4 observation, where y is the outcome and x is the predictor.

y x
0 5
1 5
2 6
3 6

The optimal split for predicting y given x is somewhere between x being 5 and 6. Let's say 5.5

Now, if we scale x, by multipling it by 100, we change our optimal split to being, say, 550. But, our predictions (and thus our error) are completely unchanged.

y x
0 500
1 500
2 600
3 600
Related Question