L2-Norm Regularization – Is There a Closed Form Solution for L2-Norm Regularized Linear Regression?

regressionregularization

Consider the penalized linear regression problem:
$$
\text{minimize}_\beta \,\,(y-X\beta)^T(y-X\beta)+\lambda \sqrt{\sum \beta_i^2}
$$
Without the square root this problem becomes ridge regression. Note that this is not the LASSO problem which may be expressed as:
$$
\text{minimize}_\beta \,\,(y-X\beta)^T(y-X\beta)+\lambda \sum \sqrt{ \beta_i^2}
$$
This is also a special case of group LASSO when all coefficients are within one group. Is there a closed form solution to this problem?

Best Answer

You will get the ridge regression solutions, but parametrised differently in terms of the penalty parameter $\lambda$. This holds more generally for convex loss functions.

If $L$ is a convex, differentiable function of $\beta$ let $\beta(\lambda)$ denote the unique minimiser of the strictly convex function $$h(\beta) = L(\beta) + \lambda \|\beta\|_2^2$$ for $\lambda > 0$. Let, furthermore, $s(\lambda) = \|\beta(\lambda)\|_2$.

Consider now the function $$g(\beta) = L(\beta) + 2 \lambda s(\lambda) \|\beta\|_2.$$ Its Jacobian is $$Dg(\beta) = DL(\beta) + 2 \lambda s(\lambda) \frac{\beta}{\|\beta\|_2}.$$ If we plug in $\beta(\lambda)$ we find that $$Dg(\beta(\lambda)) = DL(\beta(\lambda)) + 2 \lambda \beta(\lambda) = Dh(\beta(\lambda) = 0,$$ because $\beta(\lambda)$ is a stationary point of $h$. Since $g$ is still convex this shows that $\beta(\lambda)$ is a global minimiser of $g$.

It is possible that $\lambda \mapsto \lambda s(\lambda)$ does not map $(0, \infty)$ onto $(0,\infty)$, thus there can be choices of the penalty parameter $-$ when the $\|\cdot\|_2$-penalty and not the $\|\cdot\|_2^2$-penalty is used $-$ that give minimisers that are not of the form $\beta(\lambda)$ for any $\lambda > 0$. With the squared error loss (yielding ridge regression) this will be the case for large choices of the penalty parameter, where the $\|\cdot\|_2$-penalty will give the zero solution.