From what I understand, the cp argument to the rpart
function helps pre-prune the tree in the same way as the minsplit or minbucket arguments. What I don't understand is how CP values are computed. For example
df<-data.frame(x=c(1,2,3,3,3,4), y=as.factor(c(TRUE, TRUE, FALSE, TRUE, FALSE, FALSE)), method="class")
mytree<-rpart(y ~ x, data = df, minbucket = 1, minsplit=1)
Resulting tree…
mytree
n= 6
node), split, n, loss, yval, (yprob)
* denotes terminal node
1) root 6 3 FALSE (0.5000000 0.5000000)
2) x>=2.5 4 1 FALSE (0.7500000 0.2500000) *
3) x< 2.5 2 0 TRUE (0.0000000 1.0000000) *
Summary…
summary(mytree)
Call:
rpart(formula = y ~ x, data = df, minbucket = 1, minsplit = 1)
n= 6
CP nsplit rel error xerror xstd
1 0.6666667 0 1.0000000 2.0000000 0.0000000
2 0.0100000 1 0.3333333 0.6666667 0.3849002
Where's the .666 and .01 coming from?
Best Answer
I was searching for same from many days and I came to know one thing that cp value calculation is taken care by package. By default if you do not specify "CP" value then rpart will take its as 0.01. Cp value is cost of adding node to the tree.