Solved – Information gain in random trees

cartentropy

When splitting attributes while constructing a random tree, I use information gain in order to determine the best value to split the tree on. I add nodes to the tree until a stopping criterion is met. What is the minimum value of information gain, to be used as a stopping criterion?

Best Answer

The minimum information gain required for a split is a tunable parameter and probably should be determined using cross validation on a problem by problem basis. You can also run statistical significance tests for each split, addressing the question whether the split provides a statistically significant increase in information gain over a random split. Check out pages 12 and 13 from this PDF for an example of statistical tests on splits.

Related Question