Solved – Ridge/Lasso regression negative Lambda

lassoregressionregularizationridge regression

I am here to ask something that I think it is interesting, first I just read about the shrinkage using the Ridge or Lasso regression by using the lambda as the penalty to introduce a little bias that reduces a large amount of variance. From here why lambda is doing is it flatter the slope that makes me think what if it does the opposite thing. It increases the slope by change the lambda < 0, I know we don't do that in shrinkage so my question is can it be beneficial for lambda < 0 in some expansion instead of shrinkage?. Are there any case it can be applied?

Best Answer

To consider this, let's look at what the Lasso estimates of the coefficients is trying to minimize. Suppose $y_i$ is the outcome for observation $i=1,\ldots,n$ and that $x_{ki}$ is the value of covariate $k=1,\ldots,p$ for individual $i$. We are interested in estimating the vector of $p$ coefficients, $\beta=\beta_1, \ldots, \beta_p$ is a vector of $p$ coefficients, corresponding to the $p$ covariates they are coefficients for, as well as the intercept $\beta_0$. Then the lasso estimate of $\beta$ is

$\hat{\beta}^{lasso} = \underset{\beta}{\arg\min}\left\{\underset{i=1}{\overset{n}{\sum}}\left( y_i - \beta_0 - \underset{k=1}{\overset{p}{\sum}}\beta_k x_{ki}\right)^2 + \lambda \underset{k=1}{\overset{p}{\sum}} \vert\beta_k\vert \right\}$, for some $\lambda \geq 0.$.

One reason the lasso is used is due to the fact that highly correlated covariates lead to unstable estimates of their corresponding $\beta$-coefficients, when estimated through ordinary least squares (OLS). For instance, if $X_1$ and $X_2$ are highly correlated, then the OLS estimates of $\beta_1$ and $\beta_2$ will vary a lot between samples. This leads to an inflated mean squared error in the estimates. Now, in lasso regression, since $\lambda \geq 0$, we see that the coefficients are shrunk towards 0 since the penalty term "punishes" estimates that are very large. This is, in essence, why lasso can combat some of the problems of multicolinearity.

But what lappens if we force $\lambda < 0$? Well, this is equivalent to continuing to let $\lambda \geq 0$ and then minimize:

$\hat{\beta}^{lasso} = \underset{\beta}{\arg\min}\left\{\underset{i=1}{\overset{n}{\sum}}\left( y_i - \beta_0 - \underset{k=1}{\overset{p}{\sum}}\beta_k x_{ki}\right)^2 - \lambda \underset{k=1}{\overset{p}{\sum}} \vert\beta_k\vert \right\}$, for some $\lambda \geq 0.$.

(Note the minus before the penalty term, where previously there was a plus.) Now we are instead encouraging the estimated coefficients to be as large as possible. My intuition is that this would be especially true for covariates that are independent of $y_i$. So by forcing $\lambda < 0$, you would get the estimates of coefficients that are too far away from 0.