Solved – Cross-validation for ridge regression is selecting too low value of the regularization parameter

classificationcross-validationregularizationridge regression

I perform ridge regression for classification. To find regularization parameter I do K-fold cross-validation with classification accuracy as a measure.
This gives me some $\lambda$, which I then use in training of a final model on the whole available training data. The problem is that when I take 10*$\lambda$ my test accuracy on separate dataset is much better than with $\lambda$. I cannot see a reason for that. Tell me please, why this might happen?

The lambda I get is 10^4 and beta coefficients are about 10^(-3). I have about 15000 features, which I standardize before doing regression.

Best Answer

  1. Cross validation estimates are known to have high variance, so there is no guarantee you always get exactly the best result

  2. As Theja has hinted, search for the best lambda adapts to the training sample and so can introduce "second level overfit". One recommendation is "One Standard Error Rule": the ideas is to increase lambda taking variability into account. Please see books: CART, ESL

  3. A more reliable way to select lambda is through marginal likelihood or "evidence", see this book. Can get computationally intense, I don't see it used very often.