Lasso – Lambda Value Correspondence in LARS Implementing LASSO

larslasso

We know that the modified LARS is used to implement LASSO. However what's the corresponding lambda? As there is no lambda parameter in LARS.

I found reference here:

LASSO regularisation parameter from LARS algorithm

which said:

At each iteration 𝑘, the former algorithm finds an optimal couple $(\beta^*, \lambda^*)$ minimising the regularised loss function:
\begin{align}
(\beta^*, \lambda^*) &= \text{argmin}_{(\beta,\lambda)} L(\beta,\lambda) =\text{argmin}_{(\beta,\lambda)}\Vert y-X\beta \Vert_2^2 + \lambda \Vert \beta \Vert_1&
\end{align}

But I think $\lambda$ minimizing loss function should be 0?

And another reference

LASSO: Deriving the smallest lambda at which all coefficient are zero

said the descent direction in LARS is the direction minizine the ratio of change between square loss and $L_1$ norm loss: $$\dfrac{\nabla_{\vec{s}}||y-X\beta||_2^2}{\nabla_{\vec{s}}||\beta||_1}.$$
From this, I guess $\lambda = 1$ in LARS?

Edit

Here is my understanding of LARS:

  1. We will go through the 'angular bisector' of the features in active set until appearing the new feature having smaller angle with the residual.

  2. Then we put this feature into active set. When there is no 'angular bisector' (min(number of sample. number of features)) or the residual is vertical to the feature space, LARS stops.

  3. We sum the corresponding step lengths from each step of each feature $\beta_i^*$ to get the final solution $\beta^*.$

For entire solution of pairs, does it mean whenever before a new feature added into the active set, the current solution $\beta^*_t$ corresponds a LASSO solution whose regularization coefficient $\lambda^*_t.$ Namely as the algorithm goes on, the corresponding LASSO $\lambda^*_t$ gets smaller and smaller and more and more features are no longer zero.

Best Answer

(Modified) LARS gives a sequence of coefficient estimates, call it $(\hat\beta_1,\hat\beta_2,\hat\beta_3,\dots\hat\beta_k)$. Lasso gives a path $(\tilde\beta_\lambda,\lambda)$ containing the optimal $\tilde\beta_\lambda$ for any $\lambda$, or equivalently, a path $(t,\tilde\beta_t)$ where $\tilde\beta_t$ minimises the residual sum of squares subject to $\|\tilde\beta_t\|_1\leq t$

The important claim about LARS and lasso is that

  • every $\hat\beta_i$ is also a $\tilde\beta_t$ for some decreasing sequence $\lambda_i$
  • The path $(t,\tilde\beta_t)$ is linear between the values picked out by (modified) LARS

If you want to know which $\hat\beta_i$ corresponds to a given $t$ it's easy: $t=\|\hat\beta_i\|_1$. If you want to know what $\lambda$ corresponds to a given $\hat\beta_i$, you can work it out because of the soft-thresholding property: the elements of $(X^TX\hat\beta_i-X^TXy)$ where $\hat\beta_i$ is not zero are equal to $\lambda/2$ in absolute value.