Solved – Why log-transform to normal distribution for decision trees

cartmachine learning

On page 304 of chapter 8 of An Introduction to Statistical Learning with Applications in R (James et al.), the authors say:

We use the Hitters data set to predict a baseball player’s Salary based on Years (the number of years that he has played in the major leagues) and Hits (the number of hits that he made in the previous year). We first remove observations that are missing Salary values, and log-transform Salary so that its distribution has more of a typical bell-shape. (Recall that Salary is measured in thousands of dollars.)

No additional motivation for the log-transform is given. Being that the data are being fed into decision tree algorithms, why was it important to force the data into a normal distribution? I thought most/all decision tree algorithms were invariant to scale changes.

Best Answer

In this case, the salary is the target (dependent variable/outcome) of the decision tree, not one of the features (independent variables/predictors). You are correct that decision trees are insensitive to the scale of the predictors, but since I suspect there are a small number of extremely large salaries, transforming the salaries might improve predictions because loss functions which minimize square error will not be so strongly influenced by these large values.