Solved – rpart and the printcp function

classificationcross-validationrrpart

I don't really understand how the columns "xerror" and "rel error" are calculated.
I found out that the printcp() function "gives cross-validation estimates of misclassication error (xerror), standard errors (xstd) of those estimates and the training (resubstitution) estimates (error)".

Here is an example:

car <- read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data", sep=",")
set.seed(2)
tree <- rpart(V7~.,data=car)
printcp(tree)
# OUTPUT #####################################
Classification tree:
rpart(formula = V7 ~ ., data = car)

Variables actually used in tree construction:
[1] V1 V2 V4 V5 V6

Root node error: 518/1728 = 0.29977

n= 1728 

        CP nsplit rel error  xerror     xstd
1 0.129344      0   1.00000 1.00000 0.036767
2 0.115830      2   0.74131 0.85135 0.034987
3 0.040541      4   0.50965 0.50965 0.028872
4 0.030888      7   0.38803 0.38803 0.025729
5 0.027027      9   0.32625 0.35328 0.024694
6 0.023166     10   0.29923 0.30888 0.023261
7 0.017375     12   0.25290 0.26641 0.021754
8 0.015444     14   0.21815 0.23552 0.020557
9 0.010000     16   0.18726 0.18726 0.018472
############################################
# Missclassification Error
sum(predict(tree, type="class")!=car$V7)/nrow(car)
[1] 0.05613426

Our Missclassification Error seems to be 0.05613426. In the printcp-output the estimate is 0.18726. How could this happen?
I also tried 10-fold cross-validation to estimate the expected Missclassification Error and got 0.05497715 as Error.
So how is this 0.18726 from the xerror and rel error column of the printcp output calculated?

Best Answer

You may multiply the 'Root node error' by the 'rel error'. If you do so, $0.2997 \times 0.18726 \approx 0.056$ which is the error you obtained when doing cross-validation.