Solved – How to calculate total error of decision tree

carterrorprobability

If have a decision tree, say with two leaves, how do I calculate the total error? One can calculate the error for each leaf, but is the total error the sum of the errors or the product (or neither)?

Best Answer

The total error will be the sum of the individual errors, but out of the sum of all predictions.

Most likely the easiest way to do this will be to form a confusion matrix for your model. Any software that can fit decision trees for you should be able to make a confusion matrix for you. Here is an example (coded in R) adapted from by answer here:

library(party)  # we'll use these packages
library(caret)

Cond.1 = c(2.9, 3.0, 3.1, 3.1, 3.1, 3.3, 3.3, 3.4, 3.4, 3.4, 3.5, 3.5, 3.6, 3.7, 3.7,
           3.8, 3.8, 3.8, 3.8, 3.9, 4.0, 4.0, 4.1, 4.1, 4.2, 4.4, 4.5, 4.5, 4.5, 4.6,
           4.6, 4.6, 4.7, 4.8, 4.9, 4.9, 5.5, 5.5, 5.7)
Cond.2 = c(2.3, 2.4, 2.6, 3.1, 3.7, 3.7, 3.8, 4.0, 4.2, 4.8, 4.9, 5.5, 5.5, 5.5, 5.7,
           5.8, 5.9, 5.9, 6.0, 6.0, 6.1, 6.1, 6.3, 6.5, 6.7, 6.8, 6.9, 7.1, 7.1, 7.1,
           7.2, 7.2, 7.4, 7.5, 7.6, 7.6, 10, 10.1, 12.5)

dat        = stack(list(cond1=Cond.1, cond2=Cond.2))  # the data
cart.model = ctree(ind~values, dat)                   # fits a tree
windows()
  plot(cart.model)

enter image description here

confusionMatrix(predict(cart.model), dat$ind)  # this is the confusion matrix
# Confusion Matrix and Statistics
# 
#           Reference
# Prediction cond1 cond2
#      cond1    39    15
#      cond2     0    24
#     
#                Accuracy : 0.8077          
# ...       
#                                           
#             Sensitivity : 1.0000          
#             Specificity : 0.6154          
#          Pos Pred Value : 0.7222          
#          Neg Pred Value : 1.0000          
#              Prevalence : 0.5000          
#          Detection Rate : 0.5000          
#    Detection Prevalence : 0.6923          
#       Balanced Accuracy : 0.8077 

The error is 1-accuracy (1-0.8077 = 0.1923). To get the raw number, you can sum the off-diagonal elements from the confusion matrix (0+15 = 15).