Cross-Validation – How to Apply One Standard Error Rule for Variable Selection

cross-validationstandard error

Breiman et al recommend the 1-SE rule, and show that it is successful in screening out noise variables. At page no. 80 of their book, I get confuse about the '1 S.E. Rule'.
$$R'[T(k_l)]\leq R'[T(k_0)]+S.E\{R'[T(k_0)]\} $$

Where $T_1, T_2, … $ are the number of sequence of trees (number of variables) and the corresponding estimates of K-fold cross validation prediction error are $R'[T_1], R'[T_2], … $
Then, the tree selected is $T(k_l)$, where $k_l$ is the maximum $k$ satisfying above equation. Please note that $R'[T(k_0)]=min_kR'[T_k]$

My question is that how I will calculate the $S.E\{R'[T(k_0)]\}$
? because it is only one value, Please correct me where I am wrong.

Best Answer

Isn't it as simple as calculating error of mean of $R'[T_i]$ (for a given i) using each cross validation fold as an "independent" measurement? (i.e. calculating standard deviation of $R'[T_i]$ (across K folds) and then dividing by $\sqrt{K-1}$ gives a reasonable resampling-based proxy of that standard error)

Best Answer

Related Solutions

Solved – Empirical justification for the one standard error rule when using cross-validation

Solved – How to compare the performance of two classification methods? (logistic regression and classification trees)

Related Question