Solved – Logistic regression with lasso versus PCA

feature selectionlassopcaregressionregularization

Logistic regression with lasso versus PCA?

Got asked this question in an interview. I know the main difference is that Lasso is a regularization technique (adding vars to minimize effect of large coefficients) while PCA is feature selection technique (by covariance matrix decomposition).

I answered PCA allows you to do feature selection outside of the fit and transform and therefore give more flexibility in the hyper parameter search. Whereas in lasso the "feature selection" is kind of done for you and therefore there is less scope of hyper parameter optimization.

Does that sound right?

Best Answer

I answered PCA allows you to do feature selection outside of the fit and transform and therefore give more flexibility in the hyper parameter search.

PCA can be used as a dimensionality reduction technique if you drop Principal Components based on a heuristic, but it offers no feature selection, as the Principal Components are retained instead of the original features. However, tuning the number of Principal Components retained should work better than using heuristics, unless there are many low variance components and you are simply interested in filtering them.

Whereas in lasso the "feature selection" is kind of done for you and therefore there is less scope of hyper parameter optimization.

LASSO ($\ell_1$ regularization) on the other hand can, intrinsically, perform feature selection as the coefficients of predictors are shrunk towards zero. It still requires hyperparameter tuning because there's a regularization coefficient that weights how severe is the regularization of the loss function.


As @MatthewDrury commented, ordinary PCA is agnostic to the target variable while LASSO regression isn't, as it's part of a regression model. This is the most important difference, actuallly.

Related Question