Lasso vs Ridge – When to Use Each in Regression

lassoregressionridge regression

Say I want to estimate a large number of parameters, and I want to penalize some of them because I believe they should have little effect compared to the others. How do I decide what penalization scheme to use? When is ridge regression more appropriate? When should I use lasso?

Best Answer

Keep in mind that ridge regression can't zero out coefficients; thus, you either end up including all the coefficients in the model, or none of them. In contrast, the LASSO does both parameter shrinkage and variable selection automatically. If some of your covariates are highly correlated, you may want to look at the Elastic Net [3] instead of the LASSO.

I'd personally recommend using the Non-Negative Garotte (NNG) [1] as it's consistent in terms of estimation and variable selection [2]. Unlike LASSO and ridge regression, NNG requires an initial estimate that is then shrunk towards the origin. In the original paper, Breiman recommends the least-squares solution for the initial estimate (you may however want to start the search from a ridge regression solution and use something like GCV to select the penalty parameter).

In terms of available software, I've implemented the original NNG in MATLAB (based on Breiman's original FORTRAN code). You can download it from:

http://www.emakalic.org/blog/wp-content/uploads/2010/04/nngarotte.zip

BTW, if you prefer a Bayesian solution, check out [4,5].

References:

[1] Breiman, L. Better Subset Regression Using the Nonnegative Garrote Technometrics, 1995, 37, 373-384

[2] Yuan, M. & Lin, Y. On the non-negative garrotte estimator Journal of the Royal Statistical Society (Series B), 2007, 69, 143-161

[3] Zou, H. & Hastie, T. Regularization and variable selection via the elastic net Journal of the Royal Statistical Society (Series B), 2005, 67, 301-320

[4] Park, T. & Casella, G. The Bayesian Lasso Journal of the American Statistical Association, 2008, 103, 681-686

[5] Kyung, M.; Gill, J.; Ghosh, M. & Casella, G. Penalized Regression, Standard Errors, and Bayesian Lassos Bayesian Analysis, 2010, 5, 369-412

Best Answer

Related Solutions

Feature Selection: Why Ridge Regression Lacks Interpretability Compared to LASSO

Regression – L1 Regularization vs. Lasso and L2 Regularization vs. Ridge

Related Question