Solved – ny special case where ridge regression can shrink coefficients to zero

lassomachine learningridge regression

Are there some special cases, where the Ridge Regression can also lead to coefficients that are zero ?
It is widely known, that lasso is shrinking coefficients towards or on zero, while the ridge Regression cant shrink coefficients to zero

Best Answer

Suppose, as in the case of least squares methods, you are trying to solve a statistical estimation problem for a (vector-valued) parameter $\beta$ by minimizing an objective function $Q(\beta)$ (such as the sum of squares of the residuals). Ridge Regression "regularizes" the problem by adding a non-negative linear combination of the squares of the parameter, $P(\beta).$ $P$ is (obviously) differentiable with a unique global minimum at $\beta=0.$

The question asks, when is it possible for the global minimum of $Q+P$ to occur at $\beta=0$? Assume, as in least squares methods, that $Q$ is differentiable in a neighborhood of $0.$ Because $0$ is a global minimum for $Q+P$ it is a local minimum, implying all its partial derivatives are $0.$ The sum rule of differentiation implies

$$\frac{\partial}{\partial \beta_i}(Q(\beta) + P(\beta)) = \frac{\partial}{\partial \beta_i}Q(\beta) + \frac{\partial}{\partial \beta_i}P(\beta) = Q_i(\beta) + P_i(\beta)$$ is zero at $\beta=0.$ But since $P_i(0)=0$ for all $i,$ this implies $Q_i(0)=0$ for all $i,$ which makes $0$ at least a local minimum for the original objective function $Q.$ In the case of any least squares technique every local minimum is also a global minimum. This compels us to conclude that

Quadratic regularization of Least Squares procedures ("Ridge Regression") has $\beta=0$ as a solution if and only if $\beta=0$ is a solution of the original unregularized problem.