In Machine Learning libraries such as weka, we can set a tree to be of infinite Depth with maxDepth = -1. I am curious to know what would happen if trees were set to a depth far higher than the number of attributes/features available. In other words, what if we went on reverse-pruning spree? Would it lead to overfitting the data or would it cause the tree to perform much better?
Solved – Relation between decision tree Depth and number of Attributes
cartmachine learningoverfittingweka
Related Solutions
Depending on which tree algorithm you're using, usually there is a regularization parameter that defines the cost of splitting: a chosen feature is split if the gain in accuracy exceeds the parameter. By playing with this parameter, you can allow for more splits, and therefore exploring more depth (but still keeping depth constrained). Keep in mind that good tree algorithms will also have a pruning step at the end which will again check whether or not to keep certain branches.
As well, most split choices are done in a pseudorandom way because in general it would be prohibitively expensive to test all possible splits. Alternatively a multi-tree model would quite possibly capture the extra splits you mentioned in a handful of trees if indeed they do increase the accuracy enough, given their current structure.
You seem to fine-tune the wrong things.
On your feature selection: I don't think that this is done properly:
- You remove the good feature and all linearly correlated features. That's nice, but higher order correlated features are still there. On the other hand, strong correlation does not always mean that the feature is useless.
- So you should keep the good feature in the set and remove all the features that are useless. The goal is to still have a high score in the end. This way you make sure you don't remove good features as you would notice it because the score decreases.
- You should train a good model (at least once in a while) in order to know which features are helpful.
For the hyperparameter optimization:
- you should fix some variables in the beginning (all except n_estimators), optimize (roughly) that parameter with a more fine grained grid (from 10 to 500 in steps of 20 for example).
My general suspicion: Way to many estimators, to low learning_rate (at this stage) and to shallow (set depth to 6). Try maybe the following:
- eta = 0.2
- n_estimators = [50...400]
- subsample = [0.8]
- depth = 6
and leave the rest as is. Of course, those depend strongly on the data.
A nice guide for XGBoost hyperparameter optimization can be found here.
So I'd propose you to redo the feature selection keeping the good features in the set and sometimes use a good XGBoost configuration by optimizing it. Do not forget to maybe create a small holdout set which you do not use in the feature selection. This can be used in the end to know the real performance.
Best Answer
You cannot actually set the depth of the tree, only the maximal possible depth. In this case, the tree stops growing (or, to be more precise, the node stops expanding) as soon as splitting criterion (typically Gini impurity or information gain) is below some threshold (typically 0). Increasing the minimum number of instances per leaf (
minNumObj
in Weka) is another reason for tree to stop growing.Then, why to consider the number of features as benchmark? Depth and number of features are independent parameters. I am not aware of any "rule of thumb" for presumably "reasonable" maximal depth depending on number of features.
Anyway, setting maxDepth=-1 is not a way to improve performance. Whether it will lead to overfitting depends very much on the dataset in hand. Huge overfitting is definitely possible.