Solved – Performance of regression tree rpart

I am running a regression tree using rpart and I would like to understand how well it is performing.

I know that rpart has cross validation built in, so I should not divide the dataset before of the training.

Now, I build my tree and finally I ask to see the cp.

> fit <- rpart(slope ~ ., data = ph1)
> printcp(fit)

Regression tree:
rpart(formula = slope ~ ., data = ph1)

Variables actually used in tree construction:
[1] blocksize dimension maps      reducers 

Root node error: 8.9483/364 = 0.024583

n= 364 

        CP nsplit rel error  xerror     xstd
1 0.517156      0   1.00000 1.00305 0.095998
2 0.155374      1   0.48284 0.48686 0.047503
3 0.116019      2   0.32747 0.37237 0.034623
4 0.029928      3   0.21145 0.22534 0.021952
5 0.018020      4   0.18152 0.21134 0.021075
6 0.016643      5   0.16350 0.20052 0.021303
7 0.015986      7   0.13022 0.18776 0.021119
8 0.010000      8   0.11423 0.15334 0.016906

Now I don't follow anymore.

What are those number?

If it was a classification I could follow those number thanks to this question

But what about a regression tree ?

The test sample is here

Best Answer

CP table is the most important part of the RPART, it gives the complexity of the tree model (cp column) training error (rel error) and cross validation error (xerror).

I have a set of notes on how every numbers are calculated. But I am running a regression on the mtcar data set. Note directly to your question but I think it can answer your question well. Sorry the annotation might be little messy.

I would suggest you to read RPART manual Page 20. And if possible the original cart book.

Best Answer

Related Solutions

R – How to Use Recursive Partitioning with rpart() Method in R

Solved – Is it possible to have xerror increased in a tree using rpart

Related Question