Solved – rpart and the printcp function

I don't really understand how the columns "xerror" and "rel error" are calculated.
I found out that the printcp() function "gives cross-validation estimates of misclassication error (xerror), standard errors (xstd) of those estimates and the training (resubstitution) estimates (error)".

Here is an example:

car <- read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data", sep=",")
set.seed(2)
tree <- rpart(V7~.,data=car)
printcp(tree)
# OUTPUT #####################################
Classification tree:
rpart(formula = V7 ~ ., data = car)

Variables actually used in tree construction:
[1] V1 V2 V4 V5 V6

Root node error: 518/1728 = 0.29977

n= 1728 

        CP nsplit rel error  xerror     xstd
1 0.129344      0   1.00000 1.00000 0.036767
2 0.115830      2   0.74131 0.85135 0.034987
3 0.040541      4   0.50965 0.50965 0.028872
4 0.030888      7   0.38803 0.38803 0.025729
5 0.027027      9   0.32625 0.35328 0.024694
6 0.023166     10   0.29923 0.30888 0.023261
7 0.017375     12   0.25290 0.26641 0.021754
8 0.015444     14   0.21815 0.23552 0.020557
9 0.010000     16   0.18726 0.18726 0.018472
############################################
# Missclassification Error
sum(predict(tree, type="class")!=car$V7)/nrow(car)
[1] 0.05613426

Our Missclassification Error seems to be 0.05613426. In the printcp-output the estimate is 0.18726. How could this happen?
I also tried 10-fold cross-validation to estimate the expected Missclassification Error and got 0.05497715 as Error.
So how is this 0.18726 from the xerror and rel error column of the printcp output calculated?

Solved – rpart and the printcp function

Best Answer

Related Question

Best Answer

Related Solutions

Solved – R-square from rpart model

Solved – R rpart cross validation and 1 SE rule, why is the column in cptable called “xstd”

Related Question