Solved – Ridge/Lasso Lambda greater than 1

lassoregressionregularizationridge regression

I ran Ridge and Lasso regressions using an algorithm to automatically find the optimum lambda.

However, the algorithm couldn't find an optimum lambda between 0 and 1. In some cases I could find optimum lambdas that were a lot higher than 1 (sometimes 4 or 5 or even higher).

What does that exactly mean? I always read that the optimum lambda is mostly just a little higher than 0 and definitely not higher than one.

Does it mean that ridge and lasso aren't applicable in that case?

Thx a lot in advance,
Tobias

Best Answer

Let's work with the lasso. Recall how a lasso regression model is fitted, given $\lambda$:

$$\min_{\beta\in\mathbb{R}^p}\left\{\frac{1}{N}\|y-X\beta\|_2^2-\lambda\|\beta\|_1\right\}$$

The first part of the summand gives the mean squared 2-norm of the residuals. The second part gives the 1-norm of the parameter vector (typically not including the intercept entry $\beta_0$).

There is no reason whatsoever these two components should be comparable in magnitude. Your model could fit very well, yielding small residuals, but need large parameters. Or the other way around. Plus, you may or may not first standardize your predictors, which will change the parameter estimates.

This applies to the estimate for $\beta$, given $\lambda$. Now, if you optimize $\lambda$, perhaps using cross-validation, this means that a priori you cannot say anything about the likely range of $\lambda$, other than $\lambda\geq 0$.

TL;DR: you appear to have misremembered. The optimum $\lambda$ in no way needs to be in some specific interval. Therefore, getting a "surprising" value does not tell you anything about the appropriateness (or not) of your lasso model.

The same of course applies to ridge regression or the elastic net.