Solved – mathematical expression that shows how LASSO shrinks coefficients (including some to zero)

lassoregularizationridge regression

By using singular value decomposition (SVD), I noticed from the derivation that ridge regression shrinks the coefficients by factor $\frac{D^2}{D^2+\lambda}$, where $D$ is the diagonal matrix of the matrix $\underset{m\times n}A$. Moreover, as the penalty term $\lambda$ increases, the amount of shrinkage increases.

But, what about LASSO regression? Unlike ridge regression, LASSO regression shrinks some of the coefficients to zero. My question:

  • Is there a way to show, in some mathematical fashion, that LASSO regression shrinks some of the coefficients to zero as the notation above does for ridge regression?
    Using the two predictor case would make it easy to understand. Could you please provide mathematical lines?

EDIT

Knight & Fu (2000) show that $\hat{\beta}_{lasso}=0$ if and only if $−\lambda I≤2\sum\limits_{i}Y_iX_i≤\lambda I$.
How does that occur?

References:

Best Answer

Firstly, I think it's worth noting that the description of what ridge does assumes that the data matrix is orthonormal.
Secondly, the answer to your question is yes under those circumstances. The details may be found in "Elements of Statistical Learning" on p. 69 bis (section 3.4.3) . The short story is that $ \beta \to \text{sign}(\beta)\max(\beta-\lambda,0)$ is the formula. Please see the book for the complete discussion, better formatting, and details.

Related Question