Solved – CART – Classification And Regression Trees

cart

I try to prune one regression tree build with the rpart function in R.
To decide where to prune the tree I used the plotcp function. But I noticed that if I use the same predictor variables and in the same order the plotcp graph always change? How is this possible?
Thank you for your explanation in advance.

Best Answer

If you re-run just plotcp you should get the same exact plot. But if you re-run rpart you will get different fits because randomization is involved. You can avoid this by setting a seed before each run of rpart.

e.g.

fit1 <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis)
plotcp(fit1)

fit2 <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis)
plotcp(fit2)

will yield two different trees and thus two different fits, and the plot will reflect that. But

plotcp(fit2)
plotcp(fit2)
plotcp(fit2)

should be identical, as should,

set.seed(10020101)
fit1 <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis)
plotcp(fit1)

set.seed(10020101)
fit2 <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis)
plotcp(fit2)