Solved – Decision Tree Interpretation (Classification using rpart)


When using rpart to create classification tree, the values for the relative importance of each predictor show up along these lines:

Var1: 33
Var2: 31
Var3: 25
Var4: 3

In my case Var3 is plotted as the root node. I expected that Var1 would have been the root node, given that it has the highest relative importance. Based on this, would it be reasonable to expect that Var1-3 would show up more and/or higher up towards the root of the tree? That question also applies to decision trees in general.


Best Answer

Variable 3 is the predictor that provides the most separation in the two nodes after a single binary split. Predictors might show up multiple times further down the tree resulting greater overall importance.