Solved – Relation between decision tree Depth and number of Attributes

cartmachine learningoverfittingweka

In Machine Learning libraries such as weka, we can set a tree to be of infinite Depth with maxDepth = -1. I am curious to know what would happen if trees were set to a depth far higher than the number of attributes/features available. In other words, what if we went on reverse-pruning spree? Would it lead to overfitting the data or would it cause the tree to perform much better?

Best Answer

You cannot actually set the depth of the tree, only the maximal possible depth. In this case, the tree stops growing (or, to be more precise, the node stops expanding) as soon as splitting criterion (typically Gini impurity or information gain) is below some threshold (typically 0). Increasing the minimum number of instances per leaf (minNumObj in Weka) is another reason for tree to stop growing.

Then, why to consider the number of features as benchmark? Depth and number of features are independent parameters. I am not aware of any "rule of thumb" for presumably "reasonable" maximal depth depending on number of features.

Anyway, setting maxDepth=-1 is not a way to improve performance. Whether it will lead to overfitting depends very much on the dataset in hand. Huge overfitting is definitely possible.

Related Question