Solved – Lasso regression solutions

lassoregression

In the ridge regression $\hat{\beta}^{ridge}=argmin\left \{ \frac{1}{2}\sum_{i=1}^{N}(y_i-\beta_0-\sum_{j=1}^{p}x_{ij}\beta_j)^2 +\lambda \sum_{j=1}^{p}\beta_j^2 \right \}$ the ridge regression solutions can be computed form $\hat{\beta}^{ridge}=(X^tX+\lambda I)^{-1}X^ty$

But for the lasso regression case: $\hat{\beta}^{lasso}=argmin\left \{ \frac{1}{2}\sum_{i=1}^{N}(y_i-\beta_0-\sum_{j=1}^{p}x_{ij}\beta_j)^2 +\lambda \sum_{j=1}^{p}|\beta_j| \right \}$
With which matrix form formula can I compute the lasso regression solutions? I looked in the Elements of statistical learning book and that wasn't anywhere, only the beta solutions for ridge.

Best Answer

With which matrix form formula can I compute the lasso regression solutions?

As @Matthew Drury points out there is no closed form solution to the multivariate lasso problem. To understand why this is the case you must first familiarise yourself with the closed form solution to the univariate lasso problem which is derived here:


Univariate Lasso problem

Computing the subdifferential of the Lasso cost function and equating to zero to find the minimum:

\begin{aligned} \partial_{\theta_j} RSS^{lasso}(\theta) &= \partial_{\theta_j} RSS^{OLS}(\theta) + \partial_{\theta_j} \lambda || \theta ||_1 \\ 0 & = -\rho_j + \theta_j z_j + \partial_{\theta_j} \lambda || \theta_j || \\ 0 & = \begin{cases} -\rho_j + \theta_j z_j - \lambda & \text{if}\ \theta_j < 0 \\ [-\rho_j - \lambda ,-\rho_j + \lambda ] & \text{if}\ \theta_j = 0 \\ -\rho_j + \theta_j z_j + \lambda & \text{if}\ \theta_j > 0 \end{cases} \end{aligned}

For the second case we must ensure that the closed interval contains the zero so that $\theta_j = 0$ is a global minimum

\begin{aligned} 0 \in [-\rho_j - \lambda ,-\rho_j + \lambda ] \end{aligned}

Solving for $\theta_j$ gives:

\begin{aligned} \begin{cases} \theta_j = \frac{\rho_j + \lambda}{z_j} & \text{for} \ \rho_j < - \lambda \\ \theta_j = 0 & \text{for} \ - \lambda \leq \rho_j \leq \lambda \\ \theta_j = \frac{\rho_j - \lambda}{z_j} & \text{for} \ \rho_j > \lambda \end{cases} \end{aligned}

We recognize this as the soft thresholding function with a normalizing constant.

Multivariate Lasso problem

$$ min_{\theta} \ \frac{1}{2} \ || \mathbf{y - X \theta}||_2^2 + \lambda || \theta||_1$$

Taking the subdifferential and ensuring that the zero is contained in the closed interval (i.e. stationary conditions for global minimum) gives:

$$ 0 \in \mathbf{X^T(y - X \theta) + \partial (\lambda || \theta||_1)}$$

Re-arranging we get

$$ \mathbf{X^T(y - X\theta)} \in \lambda || \theta ||_1$$

Where $\theta$ appears on both sides of the equation. You can only solve this explicitely if $X^TX = I$ which means that $X$ is orthonormal, since you could then use the soft-thresolding function to write explicit solutions for each component in $\theta$