Solved – Decision Trees on training data

cart

Wouldn't any decision tree trained on a training data set have no errors in classification? In other words, wouldn't every data point be classified correctly in the training data set? How would this tie in with the misclassification rate?

Best Answer

A decision tree trained on a training data set would only have no errors in classification if:

  • You allowed your tree to have an infinite number of splits.

Theoretically then you could have a large series of branches which lead to terminal nodes that each have one observation and the correct classification on the training data.

This model however, would not generalize to new data. The model would most likely be incredibly poor when applied to new (Test) data and you have overfitted your model. Therefore when building a classification tree model pruning must be performed.

To prune your model you use the complexity parameter which balances the tradeoff between overfitting your model and the missclassification rate

See Using Tree-Based Models in R for a good explanation in R

Also Choosing The Complexity Parameter instructions

Related Question