Solved – LASSO relationship between $\lambda$ and $t$

lagrange multiplierslassooptimizationregularization

My understanding of LASSO regression is that the regression coefficients are selected to solve the minimisation problem:

$$\min_\beta \|y – X \beta\|_2^2 \ \\s.t. \|\beta\|_1 \leq t$$

In practice this is done using a Lagrange multiplier, making the problem to solve

$$\min_\beta \|y – X \beta\|_2^2 + \lambda \|\beta\|_1 $$

What is the relationship between $\lambda$ and $t$? Wikipedia unhelpfully simply states that is "data dependent".

Why do I care? Firstly for intellectual curiosity. But I am also concerned about the consequences for selecting $\lambda$ by cross-validation.

Specifically, if I'm doing n-fold cross validation, I fit n different models to n different partitions of my training data. I then compare the accuracy of each of the models on the unused data for a given $\lambda$. But the same $\lambda$ implies a different constraint ($t$) for different subsets of the data (i.e., $t=f(\lambda)$ is "data dependent").

Isn't the cross validation problem I really want to solve to find the $t$ that gives the best bias-accuracy trade-off?

I can get a rough idea of the size of this effect in practice by calculating $\|\beta\|_1$ for each cross-validation split and $\lambda$ and looking at the resulting distribution. In some cases the implied constraint ($t$) can vary quiet substantially across my cross-validation subsets. Where by substantially I mean the coefficient of variation in $t>>0$.

Best Answer

This is the standard solution for ridge regression:

$$ \beta = \left( X'X + \lambda I \right) ^{-1} X'y $$

We also know that $\| \beta \| = t$, so it must be true that

$$ \| \left( X'X + \lambda I \right) ^{-1} X'y \| = t $$.

which is not easy to solve for $\lambda$.

Your best bet is to just keep doing what you're doing: compute $t$ on the same sub-sample of the data across multiple $\lambda$ values.