Solved – Lasso regression feature selection

lassoregressionridge regression

I have been reading many articles on LASSO regression, and everyone claims that LASSO address multicollinearity showing contour plots of cost function touching the corner of the diamond(x1+x2). In the first place, I didn't understand how does this phenomenon translate to addressing multicollinearity when there are many groups of correlated variables in the data.

Verbatim from one of the research paper: "• If there are grouped variables (highly correlated between each other)
LASSO tends to select one variable from each group ignoring the others".

Best Answer

Everyone claims that LASSO address multicollinearity

Lasso regression allows you to set one or many of your feature coefficients to be exactly equal to zero by varying the parameter $\lambda$. In cases of multi-correlation, i.e. many features are correlated with each other, this can be useful as the Lasso regression will set some of them to zero and leave the others to do their job, hence reducing the number of correlated variables.

I didn't understand how does this phenomenon translate to addressing multicollinearity when there are many groups of correlated variables in the data

As @Gavin Simpson points out, Lasso will have a tendency to select 1 variable from a group of correlated variables - but again this will depend on the value of $\lambda$ chosen and on the specifics of your dataset - it is not a rule or a certainty.

An example

An example in 2D as it is hard to visualizing anything more - where $x_1$ and $x_2$ are highly correlated, you see that from the shape of the OLS contour plot on the bottom right

enter image description here

At some point, there is a value of $\lambda$ where the lasso solution (the red dot) is equal to 0 for one coefficient ($\beta_1$) and $-2.5$ for the other coefficient ($\beta_2$) - so you could say that this particular value of $\lambda$ allows to select only one feature from a group (of 2) correlated features. The same applies in higher dimensions

Code to generate the figure can be found here