Solved – mathematical expression that shows how LASSO shrinks coefficients (including some to zero)

By using singular value decomposition (SVD), I noticed from the derivation that ridge regression shrinks the coefficients by factor $\frac{D^2}{D^2+\lambda}$, where $D$ is the diagonal matrix of the matrix $\underset{m\times n}A$. Moreover, as the penalty term $\lambda$ increases, the amount of shrinkage increases.

But, what about LASSO regression? Unlike ridge regression, LASSO regression shrinks some of the coefficients to zero. My question:

Is there a way to show, in some mathematical fashion, that LASSO regression shrinks some of the coefficients to zero as the notation above does for ridge regression?
Using the two predictor case would make it easy to understand. Could you please provide mathematical lines?

EDIT

Knight & Fu (2000) show that $\hat{\beta}_{lasso}=0$ if and only if $−\lambda I≤2\sum\limits_{i}Y_iX_i≤\lambda I$.
How does that occur?

References:

Knight, Keith, and Wenjiang Fu. "Asymptotics for lasso-type estimators." Annals of Statistics (2000): 1356-1378.

Best Answer

Firstly, I think it's worth noting that the description of what ridge does assumes that the data matrix is orthonormal.
Secondly, the answer to your question is yes under those circumstances. The details may be found in "Elements of Statistical Learning" on p. 69 bis (section 3.4.3) . The short story is that $ \beta \to \text{sign}(\beta)\max(\beta-\lambda,0)$ is the formula. Please see the book for the complete discussion, better formatting, and details.

Best Answer

Related Solutions

Solved – How to calculate regularization parameter in ridge regression given degrees of freedom and input matrix

Solved – If only prediction is of interest, why use lasso over ridge

Related Question