Solved – Why using cross validation is not a good option for Lasso regression

cross-validationlassomachine learningregression

I watched the lecture about Lasso and at the end of this module (between 00:40 and 01:25) she explains how to choose the regularization parameter Lambda and it sounds like using (K-fold)Cross Validation technique is not a good option for Lasso. But, I don't understand why? What's the problem?

Best Answer

So, the point is that when you define an optimal value of $\lambda$ you must ask optimal for what? In the case of the LASSO, there are two possible goals:

  1. Estimate $\lambda_{\text{pred}}$, the value of $\lambda$ that leads to the best prediction error.

  2. Estimate $\lambda_{\text{ms}}$, the value of $\lambda$ that produces the correct model (or at least something that is close to it).

As Dr. Fox correctly notes, in general it is not the case that $\lambda_{\text{pred}} = \lambda_{\text{ms}}$, and typically $\lambda_{\text{pred}} < \lambda_{\text{ms}}$. But choosing $\lambda$ by cross-validation is using prediction error, and hence one would expect it to estimate $\lambda_{\text{pred}}$. Consequently, if you choose $\lambda$ by cross validation, you may select a $\lambda$ which leads to a model which is too big. If your goal is recovery of the true model, it follows that one should be careful applying cross validation.

I personally encounter this issue a lot when writing papers whenever I do a simulation study looking at the lasso for variable selection. Invariably, using cross-validation to select $\lambda$ is a disaster. I have had much better luck applying Lasso$(\lambda)$ to select the model and then fitting by least squares, then applying cross-validation to this entire procedure to select $\lambda$. It's still not ideal, but it is a big improvement.

That's not to say that cross-validation is completely off the table for model selection, it's just that you need to think carefully about what $\lambda$ your method is estimating. For example, lets consider ignoring the lasso and just think of a low-dimensional linear regression. In this case, leave-one-out cross validation is known to be more or less equivalent to some variant of AIC, and AIC is well-known to be inconsistent for model selection. Similarly, BIC is generally associated with leave-$V$-out cross validation where $V$ is some function of the size of the data, and it is well-known that variants of BIC are model selection consistent. Hence, there is some way of doing cross-validation that we would expect to be consistent for model selection, but leave-one-out is not.