The rpart() function in R returns cptable that includes columns xerror and xstd.
Here is an arbitrary example.
CP nsplit rel error xerror xstd
1 0.161992664 0 1.0000000 1.0002790 0.01853630
2 0.043985638 1 0.8380073 0.8385070 0.01749290
3 0.030278222 2 0.7940217 0.7963870 0.01709283
4 0.013881619 3 0.7637435 0.7695997 0.01653832
5 0.010181164 4 0.7498619 0.7560406 0.01606136
6 0.008004043 5 0.7396807 0.7466449 0.01600352
7 0.007026176 6 0.7316767 0.7356289 0.01549501
8 0.006614587 8 0.7176243 0.7388091 0.01559568
9 0.005312278 10 0.7043951 0.7254237 0.01522645
10 0.004883811 11 0.6990828 0.7248227 0.01526605
Some argue that the tree should be pruned based on the minimum cross-validated error (xerror) and thus would prune at row 10, where the minimum xerror occurred.
Other argue that "1SE rule" advises to look for the minimum but then go up 1SE because that tree is less complex. Using column xstd, that would suggest using 0.7248227 + 1*0.01526605 = 0.7400887 and thus pruning should occur at row 7.
See also this post:
How to choose the number of splits in rpart()?
My simple question: why is the column labeled "xstd" (presumably meaning cross-validated standard deviation), and yet people refer to this as the 1SE rule and not the 1SD rule.
Best Answer
'xstd' is simply a poor label, it should say 'xse' since it's actually outputting the standard error, as opposed to the standard deviation. If you select row 7 in the above, then you are properly applying the '1SE Rule' as you intended.