Lasso Regression – Understanding Sparsity in Lasso Regression Geometrically

lassoregression

Whenever someone writes about Lasso and Ridge Regression thy draw this diagram with the circle or with the diamond.

enter image description here

In the case of the diamond (Lasso regression) it is then always stated that Lasso forces one of the coefficients to 0. Therefor it introduces sparsity. I understand it somehow, but whenever I see the diagram my doubts return. Why couldn't one just draw it like this:

enter image description here

Obviously none of the coefficients is forced to zero in this case. Both can take number between -1 and 1. What am I missing? My drawing has to be wrong, but I don't get it why they always draw so that it hits $\beta_1=0$

Edit:

Just found this quote:

However, the lasso constraint has corners at each of the axes and so the ellipse will often intersect the constraint region at an axis

Is that it? It will intersect often with the constraint region, but it doesn't have to? Can't wrap my head around it. I can only imagine that in higher dimensional cases hitting a corner becomes more likely or even inevitable.

Best Answer

Each circle around your point $\beta$ is actually an isoline in the 3rd dimension, i.e. upwards, and every points on such a line have the same value for the loss function. You could draw infinitely many such lines because these are visual simplification of something that should be a surface.

To answer your question: draw an additional isoline a bit further and you will get one that intersects with the vertices of your square.

It is not true that the lasso forces parameters to be zero immediately... what is true is that lasso leads parameters to converge to zero asymptotically as a function of $\alpha$ the lasso coefficient.

Here is a picture of the actual path of parameters on your such graph: taken from Lasso regression feature selection

enter image description here

And here is a different visualization taken from: Graphical path Coordinate Descent in case of semi-differentiable functions such as Lasso in 3D

enter image description here

Related Question