Lasso vs Ridge Regression

bias-variance tradeofflassomachine learningridge regression

My question relates on the Ridge vs Lasso Regression. I know the difference in the cost function (ridge penalizes sum of quadratic coefficients, lasso penalizes sum of absolute value of coefficients). Moreover, I also know that Lasso is able of reducing some coefficients completely to zero while ridge only does towards zero.

So my question is whether one can therefore say from a theoretical perspective that Lasso should have a lower variance (generalizes better) but a higher bias than Ridge because of the above mentioned property of reducing coefficients completely to zero (of course if one applies the same strength of regularization for both of them)?

Thank you.

Best Answer

No.

There are several issues with the way you describe the premise of the question. For one, it is meaningless to say "the same strength of regularization." The fact that you may use the same greek letter for regularization parameters in ridge and lasso doesn't make them directly comparable. Just use $\alpha$ in ridge and $\lambda$ in lass for regularization paramater to see my point. You can drive coefficients to zero with both methods by cranking up the regularization.

Secondly, you seem to be trying to look at these techniques from variance/bias point of view, which is not natural in this context. Regularization in these techniques is to address overparameterization. They do it slightly differently, where ridge is rapidly penalizing the coefficients as they grow, and lasso does it uniformly over the value ranges of coefficients. You may say lasso may allow more coefficients to stick out farther compared to ridge.

Related Question