Machine Learning – Do CART Trees Capture Interactions Among Predictors?

cartclassificationdata miningmachine learning

This paper claims that in CART, because a binary split is performed on a single covariate at each step, all splits are orthogonal and therefore interactions among covariates are not considered.

However, a lot of very serious references claim, on the contrary, that the hierarchical structure of a tree guarantees that interactions between predictors are automatically modeled (e.g., this paper, and of course the book by Hastie Elements of Statistical Learning).

Who's right? Do CART-grown trees capture interactions among input variables?

References:

Paper 1: Lee, Sun-Mi, and Patricia A. Abbott. "Bayesian networks for knowledge discovery in large datasets: basics for nurse researchers." Journal of biomedical informatics 36.4-5 (2003): 389-399.

Paper 2: Elith, Jane, John R. Leathwick, and Trevor Hastie. "A working guide to boosted regression trees." Journal of Animal Ecology 77.4 (2008): 802-813.

Best Answer

CART can capture interaction effects. An interaction effect between $X_1$ and $X_2$ occurs when the effect of explanatory variable $X_1$ on response variable $Y$ depends on the level of $X_2$. This happens in the following example:

enter image description here

The effect of poor economic conditions (call this $X_1$) depends on what type of building is being purchased ($X_2$). When investing in an office building, poor economic conditions decrease the predicted value of the investment by 140,000 dollars. But when investing in an apartment building, the predicted value of the investment decreases by 20,000 dollars. The effect of poor economic conditions on the predicted value of your investment depends on the type of property being bought. This is an interaction effect.