Cross-Validation with Regularization – How to Use It

cross-validationmachine learningregularization

I think I understand each of these concepts (cross-validation, regularization) independently, but I'm not quite clear on how they can be put together in practice.

Loosely speaking, in cross-validation I will train my models on subsets of my data, and then choose the model that performs best on the reserved portion of data. In regularization I will heuristically choose some sort of regularizer function and then try to find the parameter $\lambda$ that gives the best results. Can we use cross-validation to pick $\lambda$? I think each different value of $\lambda$ can be seen as yielding a new model, but then don't we have infinitely many models to choose from?

Best Answer

You generally do have infinitely many to choose from. There are two approaches to resolving this difficulty.

  • You can attempt to be very creative and work out mathematics for estimating the full path of models as $\lambda$ varies. This is only possible in some cases, but when it is, it is a powerful method indeed. For example the LARS methodology for lasso linear regression is exactly of this type. It is very beautiful when this works out.

But usually you can't or don't know how to do that, so:

  • You simply discretize the problem by choosing an appropriate finite sequence of lambdas $\lambda_0 < \lambda_1 < \cdots < \lambda_N$ and working only with those values. There is still some art to this, as determining what $\lambda_N$ (the maximum) and $\lambda_0$ (the minimum) should be depends on the problem being solved. You often want to choose $\lambda_N$ to be the least value that collapses the model completely to predicting the average value of the response. For example, this is the approach taken by the famed glmnet.