I am trying to build a prediction model using classification trees. While I tried the "rpart" package, the results were not entirely satisfactory. Hence, I thought of exploring conditional inference trees as well ("party" package in R)
Now, under the documentation for "ctree" function they have mentioned the following – "For example, when mincriterion = 0.95, the p-value must be smaller than 0.05 in order to split this node. This statistical approach ensures that the right sized tree is grown and no form of pruning or cross-validation or whatsoever is needed"
However, with the default mincriterion value of 0.95, I end up with just 1 split. Would it make sense, if I vary the mincriterion value (say from 0.95 to 0.90), cross validate the resulting models and pick the one with the lowest CV error?
If yes, is there a function within the party package which can help me do this? (roughly analogous to a "plotcp/printcp" function that we have in rpart)
Thanks!
Best Answer
In those situations where p-values work well (e.g., in small to moderately sized samples), the pre-pruning strategy employed in conditional inference trees typically works well. (Pre-pruning means you stop growing the tree when some condition is fulfilled - rather than first growing a larger tree and then pruning it back.)
However, it is, of course possible, to treat the significance level as a tuning parameter and choose its value based on cross-validation or out-of-bag performance etc. This can be useful for large datasets where essentially all p-values are significant in order to avoid overfitting. The strategy is implemented in the
caret
package astrain(..., method = "ctree")
.Finally, it would be conceivable to first grow a large tree (with low
mincriterion
) and then prune it based on information criteria or cost-complexity etc. But I think it's not readily available for conditional inference trees in an R package at the moment. If you're doing binary classification, you might consider a logitglmtree()
which offers AIC- and BIC-based post-pruning. This is in thepartykit
package which also contains the recommended re-implementation ofctree()
.