Solved – Derivation of closed form lasso solution

lasso

For the lasso problem
$\min_\beta (Y-X\beta)^T(Y-X\beta)$ such that $\|\beta\|_1 \leq t$. I often see the soft-thresholding result
$$ \beta_j^{\text{lasso}}= \mathrm{sgn}(\beta^{\text{LS}}_j)(|\beta_j^{\text{LS}}|-\gamma)^+ $$
for the orthonormal $X$ case. It is claimed that the solution can be "easily shown" to be such, but I've never seen a worked solution. Has anyone seen one or perhaps has done the derivation?

Best Answer

This can be attacked in a number of ways, including fairly economical approaches via the Karush–Kuhn–Tucker conditions.

Below is a quite elementary alternative argument.

The least squares solution for an orthogonal design

Suppose $X$ is composed of orthogonal columns. Then, the least-squares solution is $$ \newcommand{\bls}{\hat{\beta}^{{\small \text{LS}}}}\newcommand{\blasso}{\hat{\beta}^{{\text{lasso}}}} \bls = (X^T X)^{-1} X^T y = X^T y \>. $$

Some equivalent problems

Via the Lagrangian form, it is straightforward to see that an equivalent problem to that considered in the question is $$ \min_\beta \frac{1}{2} \|y - X \beta\|_2^2 + \gamma \|\beta\|_1 \>. $$

Expanding out the first term we get $\frac{1}{2} y^T y - y^T X \beta + \frac{1}{2}\beta^T \beta$ and since $y^T y$ does not contain any of the variables of interest, we can discard it and consider yet another equivalent problem, $$ \min_\beta (- y^T X \beta + \frac{1}{2} \|\beta\|^2) + \gamma \|\beta\|_1 \>. $$

Noting that $\bls = X^T y$, the previous problem can be rewritten as $$ \min_\beta \sum_{i=1}^p - \bls_i \beta_i + \frac{1}{2} \beta_i^2 + \gamma |\beta_i| \> . $$

Our objective function is now a sum of objectives, each corresponding to a separate variable $\beta_i$, so they may each be solved individually.

The whole is equal to the sum of its parts

Fix a certain $i$. Then, we want to minimize $$ \mathcal L_i = -\bls_i \beta_i + \frac{1}{2}\beta_i^2 + \gamma |\beta_i| \> . $$

If $\bls_i > 0$, then we must have $\beta_i \geq 0$ since otherwise we could flip its sign and get a lower value for the objective function. Likewise if $\bls_i < 0$, then we must choose $\beta_i \leq 0$.

Case 1: $\bls_i > 0$. Since $\beta_i \geq 0$, $$ \mathcal L_i = -\bls_i \beta_i + \frac{1}{2}\beta_i^2 + \gamma \beta_i \> , $$ and differentiating this with respect to $\beta_i$ and setting equal to zero, we get $\beta_i = \bls_i - \gamma$ and this is only feasible if the right-hand side is nonnegative, so in this case the actual solution is $$ \blasso_i = (\bls_i - \gamma)^+ = \mathrm{sgn}(\bls_i)(|\bls_i| - \gamma)^+ \>. $$

Case 2: $\bls_i \leq 0$. This implies we must have $\beta_i \leq 0$ and so $$ \mathcal L_i = -\bls_i \beta_i + \frac{1}{2}\beta_i^2 - \gamma \beta_i \> . $$ Differentiating with respect to $\beta_i$ and setting equal to zero, we get $\beta_i = \bls_i + \gamma = \mathrm{sgn}(\bls_i)(|\bls_i| - \gamma)$. But, again, to ensure this is feasible, we need $\beta_i \leq 0$, which is achieved by taking $$ \blasso_i = \mathrm{sgn}(\bls_i)(|\bls_i| - \gamma)^+ \>. $$

In both cases, we get the desired form, and so we are done.

Final remarks

Note that as $\gamma$ increases, then each of the $|\blasso_i|$ necessarily decreases, hence so does $\|\blasso\|_1$. When $\gamma = 0$, we recover the OLS solutions, and, for $\gamma > \max_i |\bls_i|$, we obtain $\blasso_i = 0$ for all $i$.