Solved – multiple predictors in decision tree model

cartr

I am using like 10 predictors in my decision tree, but the rpart function uses only like 8 of them. does it mean that rest 2 are not needed or they become redundant? like i have age and child coded based on age as 1/0. but the tree uses only age and not child.

Best Answer

Tree models do not have the problems with collinearity that typical regression models have; even if you have two variables that are perfectly colinear, a tree model will pick one at random. If the collinearity is not perfect then the tree model will pick the one that does the better job in the particular node of the tree that it is being tested on - it is possible to have one variable be used on one node and another, very similar variable, on another node.

So, the reason rpart doesn't pick those two variables is that they don't improve the tree (using whatever metric you have selected) or that they are pruned away.