I'm looking for a non-technical definition of the lasso and what it is used for.
Solved – the lasso in regression analysis
lassoregressionregularization
Related Solutions
From definition 1 of Meinshausen(2007), there are two parameters controlling the solution of the relaxed Lasso.
The first one, $\lambda$, controls the variable selection, whereas the second, $\phi$, controls the shrinkage level. When $\phi= 1$ both Lasso and relaxed-Lasso are the same (as you said!), but for $\phi<1$ you obtain a solution with coefficients closer to what would give an orthogonal projection on the selected variables (kind of soft de-biasing).
This formulation actually corresponds to solve two problems:
- First the full Lasso with penalization parameter $\lambda$
- Second the Lasso on $X_S$, which is $X$ reduced to variables selected by 1, with a penalization parameter $\lambda\phi$.
Yes.
Yes.
LASSO is actually an acronym (least absolute shrinkage and selection operator), so it ought to be capitalized, but modern writing is the lexical equivalent of Mad Max. On the other hand, Amoeba writes that even the statisticians who coined the term LASSO now use the lower-case rendering (Hastie, Tibshirani and Wainwright, Statistical Learning with Sparsity). One can only speculate as to the motivation for the switch. If you're writing for an academic press, they typically have a style guide for this sort of thing. If you're writing on this forum, either is fine, and I doubt anyone really cares.
The $L$ notation is a reference to Minkowski norms and $L^p$ spaces. These just generalize the notion of taxicab and Euclidean distances to $p>0$ in the following expression: $$ \|x\|_p=(|x_1|^p+|x_2|^p+...+|x_n|^p)^{\frac{1}{p}} $$ Importantly, only $p\ge 1$ defines a metric distance; $0<p<1$ does not satisfy the triangle inequality, so it is not a distance by most definitions.
I'm not sure when the connection between ridge and LASSO was realized.
As for why there are multiple names, it's just a matter that these methods developed in different places at different times. A common theme in statistics is that concepts often have multiple names, one for each sub-field in which it was independently discovered (kernel functions vs covariance functions, Gaussian process regression vs Kriging, AUC vs $c$-statistic). Ridge regression should probably be called Tikhonov regularization, since I believe he has the earliest claim to the method. Meanwhile, LASSO was only introduced in 1996, much later than Tikhonov's "ridge" method!
Best Answer
The LASSO (Least Absolute Shrinkage and Selection Operator) is a regression method that involves penalizing the absolute size of the regression coefficients.
By penalizing (or equivalently constraining the sum of the absolute values of the estimates) you end up in a situation where some of the parameter estimates may be exactly zero. The larger the penalty applied, the further estimates are shrunk towards zero.
This is convenient when we want some automatic feature/variable selection, or when dealing with highly correlated predictors, where standard regression will usually have regression coefficients that are 'too large'.
https://web.stanford.edu/~hastie/ElemStatLearn/ (Free download) has a good description of the LASSO and related methods.