If we start with a set of data $(X,Y)$, apply Lasso to it and obtain a solution $\beta^L$, we can apply Lasso again to the data set $(X_S, Y)$, where $S$ is the set of non-zero indexes of $\beta^L$, to obtain a solution, $\beta^{RL}$, called 'relaxed LASSO' solution (correct me if I'm wrong!). The solution $\beta^L$ must satisfy the Karush–Kuhn–Tucker (KKT) conditions for $(X,Y)$ but, given the form of the KKT conditions for $(X_S, Y)$, does not it also satisfy these? If so, what is the point of doing LASSO a second time?
This question is a follow up to: Advantages of doing "double lasso" or performing lasso twice?
Best Answer
From definition 1 of Meinshausen(2007), there are two parameters controlling the solution of the relaxed Lasso.
The first one, $\lambda$, controls the variable selection, whereas the second, $\phi$, controls the shrinkage level. When $\phi= 1$ both Lasso and relaxed-Lasso are the same (as you said!), but for $\phi<1$ you obtain a solution with coefficients closer to what would give an orthogonal projection on the selected variables (kind of soft de-biasing).
This formulation actually corresponds to solve two problems: