Solved – How are CP (Cost Complexity) values calculated in RPART (or decision trees in general)

cartrrpart

From what I understand, the cp argument to the rpart function helps pre-prune the tree in the same way as the minsplit or minbucket arguments. What I don't understand is how CP values are computed. For example

df<-data.frame(x=c(1,2,3,3,3,4), y=as.factor(c(TRUE, TRUE, FALSE, TRUE, FALSE, FALSE)), method="class")
mytree<-rpart(y ~ x, data = df, minbucket = 1, minsplit=1)

Resulting tree…

mytree
n= 6 

node), split, n, loss, yval, (yprob)
      * denotes terminal node

1) root 6 3 FALSE (0.5000000 0.5000000)  
  2) x>=2.5 4 1 FALSE (0.7500000 0.2500000) *
  3) x< 2.5 2 0 TRUE (0.0000000 1.0000000) *

Summary…

summary(mytree)

Call:
rpart(formula = y ~ x, data = df, minbucket = 1, minsplit = 1)
  n= 6 

         CP nsplit rel error    xerror      xstd
1 0.6666667      0 1.0000000 2.0000000 0.0000000
2 0.0100000      1 0.3333333 0.6666667 0.3849002

Where's the .666 and .01 coming from?

Best Answer

I was searching for same from many days and I came to know one thing that cp value calculation is taken care by package. By default if you do not specify "CP" value then rpart will take its as 0.01. Cp value is cost of adding node to the tree.