Party – Pruning Binary-Classification Trees Based on Practical Relevance

partypruning

I'm creating a binary classification model to develop relevant segments for a business problem. The ctree-function does a great job, especially in combination with the minbucket-argument, which avoids having too small leafs.

Because my dataset is quite large (several 100.000 cases) the algorithm often produces leafs, that only differ slightly in the proportion of positive cases.

Ideally, I'd like to prune trees, so that they are only split, if the proportion differs by more than say 5%pp. I've been trying to use mincriterion, to use the underlying test statistic for this, but this hasn't had an impact yet (probably because the dataset is so large).

Best Answer

At the moment the partykit package does not have a high-level function that provides this kind of stopping rule. We have been working on a flexible-enough implementation so that users could roll their own stopping criteria but this still needs much more work.

However, the package has low-level functions that would allow you to implement your idea as a post-pruning technique. The idea would be to extract the fitted probabilities from all nodes of the tree (e.g., using nodeapply()) and then drop those splits that you don't want (using nodeprune()) and subsequently set up the full constparty object again.

Related Question