Wouldn't any decision tree trained on a training data set have no errors in classification? In other words, wouldn't every data point be classified correctly in the training data set? How would this tie in with the misclassification rate?
Solved – Decision Trees on training data
cart
Related Solutions
It looks to me like classregtree is just building a tree, not using any of these methods, all of which are supplementary to tree building. That is, classregtree is implementing the methods described in Breiman et al., per the reference given in the documentation. It builds a tree and then (by default) prunes it.
It's generally the case that, if you're trying to maximize some loss function (classification accuracy, Brier score, log-loss, etc.) it's more effective to use modeling procedures (tree learning, tree pruning) that maximize this directly. So the default attitude would be that, if you're trying to maximize classification accuracy, you should both train and prune your tree based on classification accuracy.
However, there are a couple of things that might motivate you to make exceptions to this and not train your tree based on classification accuracy:
The tree learning algorithm is greedy, so trying to maximize classification accuracy at each step may not end up selecting the accuracy-maximizing classifier overall. This is exacerbated because classification accuracy is insensitive/noisy: if you try too hard to optimize classification accuracy, you will end up fitting on noise and overfitting. By contrast, doing accuracy-based pruning at the end is less prone to the fitting-on-noise issue because you're making fewer choices, so the consideration of maximizing your loss function directly is more important.
Classification accuracy is not a proper scoring rule, so trying too hard to maximize it can cause your classifier to return predictably bad probabilities.
For the same reason I described above, if you are trying to maximize the Brier score of the resulting tree, you might want to prune using Gini index (which is essentially Brier score). If you are trying to maximize log-loss of the resulting tree (which is essentially cross-entropy), you might want to prune using cross-entropy.
Best Answer
A decision tree trained on a training data set would only have no errors in classification if:
Theoretically then you could have a large series of branches which lead to terminal nodes that each have one observation and the correct classification on the training data.
This model however, would not generalize to new data. The model would most likely be incredibly poor when applied to new (Test) data and you have overfitted your model. Therefore when building a classification tree model pruning must be performed.
To prune your model you use the complexity parameter which balances the tradeoff between overfitting your model and the missclassification rate
See Using Tree-Based Models in R for a good explanation in R
Also Choosing The Complexity Parameter instructions