Regression – Comparing KKT and Unconstrained Formulations for Lasso Regression

lassoregressionregularization

L1 penalized regression (aka lasso) is presented in two formulations. Let the two objective functions be
$$
Q_1 = \frac{1}{2}||Y – X\beta||_2^2 \\
Q_2 =\frac{1}{2}||Y – X\beta||_2^2 + \lambda ||\beta||_1.
$$
Then the two different formulations are
$$
\text{argmin}_\beta \; Q_1
$$
subject to
$$
||\beta||_1 \leq t,
$$
and, equivalently
$$
\text{argmin}_\beta \; Q_2.
$$
Using the Karush-Kuhn-Tucker (KKT) conditions, it's easy to see how the stationarity condition for the first formulation is equivalent to taking the gradient of the second formulation and setting it equal to 0. What I can not find, nor figure out, is how the complementary slackness condition for the first formulation, $\lambda\left(||\beta||_1 – t\right) = 0$, is guaranteed to be fulfilled by the solution to the second formulation.

Best Answer

The two formulations are equivalent in the sense that for every value of $t$ in the first formulation, there exists a value of $\lambda$ for the second formulation such that the two formulations have the same minimizer $\beta$.

Here's the justification:

Consider the lasso formulation: $$f(\beta)=\frac{1}{2}||Y - X\beta||_2^2 + \lambda ||\beta||_1$$ Let the minimizer be $\beta^*$ and let $b=||\beta^*||_1$. My claim is that if you set $t=b$ in the first formulation, then the solution of the first formulation will also be $\beta^*$. Here's the proof:

Consider the first formulation $$\min \frac{1}{2}||Y - X\beta||_2^2 \text{ s.t.} ||\beta||_1\leq b$$ If possible let this second formulation have a solution $\hat{\beta}$ such that $||\hat{\beta}||_1<||\beta^*||_1=b$ (note the strictly less than sign). Then it is easy to see that $f(\hat{\beta})<f(\beta^*)$ contradicting the fact that $\beta^*$ is a solution for the lasso. Thus, the solution to the first formulation is also $\beta^*$.

Since $t=b$, the complementary slackness condition is satisfied at the solution point $\beta^*$.

So, given a lasso formulation with $\lambda$, you construct a constrained formulation using a $t$ equal to the value of the $l_1$ norm of the lasso solution. Conversely, given a constrained formulation with $t$, you find a $\lambda$ such that the solution to the lasso will be equal to the solution of the constrained formulation.

(If you know about subgradients, you can find this $\lambda$ by solving the equation $X^T(y-X\beta^*)=\lambda z^*$, where $z^* \in \partial ||\beta^*||_1)$