I am running a regression tree using rpart and I would like to understand how well it is performing.
I know that rpart has cross validation built in, so I should not divide the dataset before of the training.
Now, I build my tree and finally I ask to see the cp.
> fit <- rpart(slope ~ ., data = ph1)
> printcp(fit)
Regression tree:
rpart(formula = slope ~ ., data = ph1)
Variables actually used in tree construction:
[1] blocksize dimension maps reducers
Root node error: 8.9483/364 = 0.024583
n= 364
CP nsplit rel error xerror xstd
1 0.517156 0 1.00000 1.00305 0.095998
2 0.155374 1 0.48284 0.48686 0.047503
3 0.116019 2 0.32747 0.37237 0.034623
4 0.029928 3 0.21145 0.22534 0.021952
5 0.018020 4 0.18152 0.21134 0.021075
6 0.016643 5 0.16350 0.20052 0.021303
7 0.015986 7 0.13022 0.18776 0.021119
8 0.010000 8 0.11423 0.15334 0.016906
Now I don't follow anymore.
What are those number?
If it was a classification I could follow those number thanks to this question
But what about a regression tree ?
The test sample is here
Best Answer
CP table is the most important part of the RPART, it gives the complexity of the tree model (cp column) training error (rel error) and cross validation error (xerror).
I have a set of notes on how every numbers are calculated. But I am running a regression on the mtcar data set. Note directly to your question but I think it can answer your question well. Sorry the annotation might be little messy.
I would suggest you to read RPART manual Page 20. And if possible the original cart book.