Solved – R rpart cross validation and 1 SE rule, why is the column in cptable called “xstd”

cross-validationrpart

The rpart() function in R returns cptable that includes columns xerror and xstd.
Here is an arbitrary example.

            CP nsplit rel error    xerror       xstd
1  0.161992664      0 1.0000000 1.0002790 0.01853630
2  0.043985638      1 0.8380073 0.8385070 0.01749290
3  0.030278222      2 0.7940217 0.7963870 0.01709283
4  0.013881619      3 0.7637435 0.7695997 0.01653832
5  0.010181164      4 0.7498619 0.7560406 0.01606136
6  0.008004043      5 0.7396807 0.7466449 0.01600352
7  0.007026176      6 0.7316767 0.7356289 0.01549501
8  0.006614587      8 0.7176243 0.7388091 0.01559568
9  0.005312278     10 0.7043951 0.7254237 0.01522645
10 0.004883811     11 0.6990828 0.7248227 0.01526605

Some argue that the tree should be pruned based on the minimum cross-validated error (xerror) and thus would prune at row 10, where the minimum xerror occurred.
Other argue that "1SE rule" advises to look for the minimum but then go up 1SE because that tree is less complex. Using column xstd, that would suggest using 0.7248227 + 1*0.01526605 = 0.7400887 and thus pruning should occur at row 7.
See also this post:
How to choose the number of splits in rpart()?

My simple question: why is the column labeled "xstd" (presumably meaning cross-validated standard deviation), and yet people refer to this as the 1SE rule and not the 1SD rule.

Best Answer

'xstd' is simply a poor label, it should say 'xse' since it's actually outputting the standard error, as opposed to the standard deviation. If you select row 7 in the above, then you are properly applying the '1SE Rule' as you intended.