Regression – Why Is My Derivation of a Closed Form Lasso Solution Incorrect?

lassoregressionregularization

The lasso problem $$\beta^{\text{lasso}}= \operatorname*{argmin}_\beta \| y-X\beta\|^2_2 + \alpha \| \beta\|_1$$ has the closed form solution: $$ \beta_j^{\text{lasso}}= \mathrm{sgn}(\beta^{\text{LS}}_j)(|\beta_j^{\text{LS}}|-\alpha)^+ $$
if $X$ has orthonormal columns. This was shown in this thread: Derivation of closed form lasso solution.

However I don´t understand why there is no closed form solution in general. Using subdifferentials I obtained the following.

($X$ is a $n \times p$ Matrix)

$$f(\beta)=\|{y-X\beta}\|_2^2 + \alpha\|{\beta}\|_1$$
$$ =\sum_{i=1}^n (y_i-X_i\beta)^2 + \alpha \sum_{j=1}^p |\beta_j| $$
($X_i$ is the i-th row of $X$)
$$= \sum_{i=1}^n y_i^2 -2\sum_{i=1}^n y_i X_i \beta + \sum_{i=1}^n \beta^T X_i^T X_i \beta + \alpha \sum_{j=1}^p |\beta_j|$$
$$\Rightarrow \frac{\partial f}{\partial \beta_j}= -2\sum_{i=1}^ny_i X_{ij} + 2 \sum_{i=1}^n X_{ij}^2\beta_j + \frac{\partial}{\partial \beta_j}(\alpha |\beta_j|)$$
$$= \begin{cases}
-2\sum_{i=1}^ny_i X_{ij} + 2 \sum_{i=1}^n X_{ij}^2\beta_j + \alpha \text{ for } \beta_j > 0 \\
-2\sum_{i=1}^ny_i X_{ij} + 2 \sum_{i=1}^n X_{ij}^2\beta_j – \alpha \text{ for } \beta_j < 0 \\
[-2\sum_{i=1}^ny_i X_{ij} – \alpha, -2\sum_{i=1}^ny_i X_{ij} + \alpha] \text{ for } \beta_j = 0
\end{cases} $$
With $\frac{\partial f}{\partial \beta_j} = 0$ we get

$$\beta_j = \begin{cases}
\left( 2(\sum_{i=1}^ny_i X_{ij}) – \alpha \right)/ 2\sum_{i=1}^n X_{ij}^2 &\text{for } \sum_{i=1}^ny_i X_{ij} > \alpha \\
\left( 2(\sum_{i=1}^ny_i X_{ij}) + \alpha \right)/ 2\sum_{i=1}^n X_{ij}^2 &\text{for } \sum_{i=1}^ny_i X_{ij} < -\alpha \\
0 &\text{ for }\sum_{i=1}^ny_i X_{ij} \in [-\alpha, \alpha]
\end{cases}$$

Does anyone see where I did go wrong?

Answer:

If we write the problem in terms of matrices we can see very easily why a closed form solution only exists in the orthonormal case with $X^TX= I$:

$$ f(\beta)= \| y-X\beta\|^2_2 + \alpha \| \beta\|_1$$
$$= y^Ty -2\beta^TX^Ty + \beta^TX^TX\beta + \alpha \| \beta\|_1$$
$$\Rightarrow \nabla f(\beta)=-2X^Ty + 2X^TX\beta + \nabla(\alpha| \beta\|_1)$$
(I have taken many steps at once here. However, up to this point this completely analog to the derivation of the least squares solution. So you should be able to find the missing steps there.)
$$\Rightarrow \frac{\partial f}{\partial \beta_j}=-2X^T_{j} y + 2(X^TX)_j \beta + \frac{\partial}{\partial \beta_j}(\alpha |\beta_j|) $$

With $\frac{\partial f}{\partial \beta_j} = 0$ we get

$$2(X^TX)_j \beta =2X^T_{j} y – \frac{\partial}{\partial \beta_j}(\alpha |\beta_j|) $$
$$\Leftrightarrow 2(X^TX)_{jj} \beta_j = 2X^T_{j} y – \frac{\partial}{\partial \beta_j}(\alpha |\beta_j|) – 2\sum_{i=1,i\neq j}^p(X^TX)_{ji}\beta_i $$

We can see now that our solution for one $\beta_j$ is dependent upon all the other $\beta_{i\neq j}$ so it isn't clear how to proceed from here. If $X$ is orthonormal we have $2(X^TX)_j \beta = 2(I)_j \beta = 2\beta_j$ so there certainly exists a closed form solution in this case.

Thanks to Guðmundur Einarsson for his answer,on which I elaborated here. I hope this time it`s correct 🙂

Best Answer

This is normally done with least angle regression, you can find the paper here.

Sorry about my confusion in the beginning, I am going to make another attempt at this.

So after the expansion of your function $f(\beta)$ you get

$$ f(\beta)=\sum_{i=1}^n y_i^2 -2\sum_{i=1}^n y_i X_i \beta + \sum_{i=1}^n \beta^T X_i^T X_i \beta + \alpha \sum_{j=1}^p |\beta_j| $$

Then you calculate the partial derivative with respect to $\beta_j$. My concern is in your calculation of the partial derivative of the last term before the 1-norm, i.e. the quadratic term. Let's examine it further. We have that:

$$ X_i\beta = \beta^T X_i^T = (\beta_1 X_{i1}+\beta_2 X_{i2}+\cdots+ \beta_p X_{ip}) $$ So you can essentially rewrite your quadratic term as: $$ \sum_{i=1}^n \beta^T X_i^T X_i \beta = \sum_{i=1}^n (X_i \beta)^2 $$ Now we can use the chain rule to calculate the derivative of this w.r.t. $\beta_j$: $$ \frac{\partial }{\partial \beta_j} \sum_{i=1}^n (X_i \beta)^2 = \sum_{i=1}^n \frac{\partial }{\partial \beta_j} (X_i \beta)^2 = \sum_{i=1}^n 2(X_i \beta)X_{ij} $$

So now your problem does not simplify as easily, because you have all the $\beta$ coefficients present in each equation.

This does not answer your question of why there is no closed form solution of the Lasso, I might add something later on that.