Solved – How to Apply the Iteratively Reweighted Least Squares (IRLS) Method to the LASSO Model

convexfeature selectiongeneralized linear modellassologistic

I have programmed a logistic regression using the IRLS algorithm. I would like to apply a LASSO penalization in order to automatically select the right features. At each iteration, the following is solved:

$$\mathbf{\left(X^TWX\right) \delta\hat\beta=X^T\left(y-p\right)}$$

Let $\lambda$ be a non-negative real number. I am not penalizing the intercept as suggested in The Elements of. Statistical Learning. Ditto for the already zero coefficients. Otherwise, I subtract a term from the right-hand side:

$$\mathbf{X^T\left(y-p\right)-\lambda\times \mathrm{sign}\left(\hat\beta\right)}$$

However, I am unsure about the modification of the IRLS algorithm. Is it the right way to do?


Edit: Although I was not confident about it, here is one of the solutions I finally came up with. What is interesting is this solution corresponds to what I now understand about LASSO. There are indeed two steps at each iteration instead of merely one:

  • the first step is the same as before : we make an iteration of the algorithm (as if $\lambda=0$ in the formula for the gradient above),
  • the second step is the new one: we apply a soft-thresholding to each component (except for the component $\beta_0$, which corresponds to the intercept) of the vector $\beta$ obtained at the first step. This is called Iterative Soft-Thresholding Algorithm.

$$\forall i \geq 1, \beta_{i}\leftarrow\mathrm{sign}\left(\beta_{i}\right)\times\max\left(0,\,\left|\beta_{i}\right|-\lambda\right)$$

Best Answer

This problem is typically solved by fit by coordinate descent (see here). This method is both safer more efficient numerically, algorithmically easier to implement and applicable to a more general array of models (also including Cox regression). An R implementation is available in the R package glmnet. The codes are open source (partly in and in C, partly in R), so you can use them as blueprints.