Solved – Why is “relaxed lasso” different from standard lasso

lassooptimizationregressionregularization

If we start with a set of data $(X,Y)$, apply Lasso to it and obtain a solution $\beta^L$, we can apply Lasso again to the data set $(X_S, Y)$, where $S$ is the set of non-zero indexes of $\beta^L$, to obtain a solution, $\beta^{RL}$, called 'relaxed LASSO' solution (correct me if I'm wrong!). The solution $\beta^L$ must satisfy the Karush–Kuhn–Tucker (KKT) conditions for $(X,Y)$ but, given the form of the KKT conditions for $(X_S, Y)$, does not it also satisfy these? If so, what is the point of doing LASSO a second time?

This question is a follow up to: Advantages of doing "double lasso" or performing lasso twice?

Best Answer

From definition 1 of Meinshausen(2007), there are two parameters controlling the solution of the relaxed Lasso.

The first one, $\lambda$, controls the variable selection, whereas the second, $\phi$, controls the shrinkage level. When $\phi= 1$ both Lasso and relaxed-Lasso are the same (as you said!), but for $\phi<1$ you obtain a solution with coefficients closer to what would give an orthogonal projection on the selected variables (kind of soft de-biasing).

This formulation actually corresponds to solve two problems:

  1. First the full Lasso with penalization parameter $\lambda$
  2. Second the Lasso on $X_S$, which is $X$ reduced to variables selected by 1, with a penalization parameter $\lambda\phi$.