Solved – Lasso Regression as Variable Selection

feature selectionlassostepwise regression

Suppose we are initially given $p$ predictor variables. In lasso regression, we want to find estimates of the coefficients $\beta_1, \dots, \beta_p$ that minimize $\text{RSS}+ \lambda \sum_{j=1}^{p} |\beta_j|$ where $\text{RSS}$ is the residual sum of squares. So if $\lambda = 0$ then we just have regular ordinary least squares regression. As we increase $\lambda$, we are essentially assuming more structure in the data and so increase the bias and reduce the variance compared to OLS. Basically we are assuming a prior distribution for the coefficients. Now some of the coefficients can be $0$. So this is also a variable selection procedure.

Question. Can one first do a lasso regression and then some sort of stepwise selection technique for variable reduction? Does the order
of performing these techniques matter?

Best Answer

LASSO doesn't just select among a set of predictors and perform a bias-variance tradeoff. In doing so it also penalizes the coefficients of the selected predictors in a way that minimizes overfitting. If you simply use LASSO to choose predictors and then go on your way without accounting for your having already used the outcomes to choose those predictors, then you can be losing that protection from overfitting. See this page for further details.

In general, stepwise selection of predictors is likely to provide overfitting and poor generalization to other samples of data; there is much discussion on this page with links to further discussion.

So if you have already selected a set of predictors with LASSO, it's best to stick with them and with their penalized regression coefficients.

Related Question