I am new to R and rpart package. When I plot the tree using rpart:
> temp_control <- rpart.control(xval=10, minbucket=2, minsplit=4, cp=0.0001)
> dfit <- rpart(Target~., data = temp_data, method = 'class', control=temp_control)
> printcp(dfit)
Then I get :
CP nsplit rel error xerror xstd
1 0.00189329 0 1.00000 1.0000 0.040140
2 0.00172117 28 0.92255 1.0861 0.041708
3 0.00114745 32 0.91566 1.0947 0.041861
4 0.00098353 41 0.90534 1.1102 0.042133
5 0.00086059 48 0.89845 1.1274 0.042433
6 0.00043029 62 0.88640 1.1515 0.042849
7 0.00034423 75 0.87952 1.1635 0.043055
8 0.00028686 80 0.87780 1.1687 0.043142
9 0.00010000 89 0.87263 1.1807 0.043346
Why does xerror increase with the growth of a tree? Do I need some more adjustment of the parameters? Also, I am wondering how the root node error is calculated. Is it only related to a certain dataset? Does it have any relationship with parameters setting?
Actually, I tried "anova" method although my response variable is categorical (Y/N). I just changed them to 0/1 and run "anova", then I can get :
CP nsplit rel error xerror xstd
1 3.1473e-02 0 1.00000 1.00025 0.037408
2 1.1506e-02 1 0.96853 0.97164 0.035479
3 5.6396e-03 2 0.95702 0.96528 0.035172
4 4.6137e-03 3 0.95138 0.96970 0.035029
5 4.4412e-03 6 0.93754 0.97246 0.035019
6 4.3751e-03 7 0.93310 0.97006 0.034915
7 4.1352e-03 10 0.91997 0.97109 0.034912
8 3.5702e-03 11 0.91584 0.97316 0.034847
9 3.0148e-03 14 0.90513 0.96819 0.034671
10 2.5334e-03 15 0.90211 0.96872 0.034725
11 2.2789e-03 16 0.89958 0.96959 0.034753
12 2.2342e-03 17 0.89730 0.97437 0.034829
13 1.8732e-03 18 0.89507 0.98647 0.035104
14 1.8401e-03 19 0.89319 0.99511 0.035199
Anyone has any idea about this?
Best Answer
xerror
means cross validation error. Why validation error increase? Because over fitting. This is exactly what cross validation used for. In your case, it makes perfect sense because, more splits inrpart
tree means more complex model, which is more possibilities for over fitting.Try
plotcp
function to see, when to overfit and select the "right" tree size.