Solved – Using LASSO only for feature selection

feature selectionlassoregression-strategies

In my machine learning class, we have learned about how LASSO regression is very good at performing feature selection, since it makes use of $l_1$ regularization.

My question: do people normally use the LASSO model just for doing feature selection (and then proceed to dump those features into a different machine learning model), or do they typically use LASSO to perform both the feature selection and the actual regression?

For example, suppose that you want to do ridge regression, but you believe that many of your features are not very good. Would it be wise to run LASSO, take only the features that are not near-zeroed out by the algorithm, and then use only those in dumping your data into a ridge regression model? This way, you get the benefit of $l_1$ regularization for performing feature selection, but also the benefit of $l_2$ regularization for reducing overfitting. (I know that this basically amounts to Elastic Net Regression, but it seems like you don't need to have both the $l_1$ and $l_2$ terms in the final regression objective function.)

Aside from regression, is this a wise strategy when performing classification tasks (using SVMs, neural networks, random forests, etc.)?

Best Answer

Almost any approach that does some form of model selection and then does further analyses as if no model selection had previously happened typically has poor properties. Unless there are compelling theoretical arguments backed up by evidence from e.g. extensive simulation studies for realistic sample sizes and feature versus sample size ratios to show that this is an exception, it is likely that such an approach will have unsatisfactory properties. I am not aware of any such positive evidence for this approach, but perhaps someone else is. Given that there are reasonable alternatives that achieve all desired goals (e.g. the elastic net), it this approach is hard to justify using such a suspect ad-hoc approach instead.