Solved – How does Ridge Regression penalize for complexity if the coefficients are never allowed to go to zero

lassoregularizationridge regression

In the context of trying to understand regularization and how it works for ridge regression vs. lasso regression, I've come across two ideas:

  • Both of these methods attempt to improve generalization error by penalizing a model for complexity.
  • In Lasso, the coefficients in a model can go to zero, so it operates as a variable selection procedure as well. In Ridge regression on the other hand, the coefficients can be small but can't go all the way to zero.

What I can't understand is how is Ridge Regression penalizing for complexity then? The number of coefficients remains the same, even if the values go down.

Isn't the model $\hat{y}=0.01x_1+2x_2+0.03x_3$ just as complex as $\hat{y}=5x_1+4x_2+7x_3$?

How exactly is complexity being evaluated in the case of ridge regression?

Best Answer

The computational complexity of LASSO and Ridge are about the same, they are both roughly cubic. The problem regularization addresses in estimating coefficients is sensitivity to noise. Highly correlated data will produce coefficient estimates that have extremely large variance, rendering the estimates unreliable. LASSO eliminates coefficients that are correlated and just picks one out of a set of correlated coefficients. The problem with LASSO is that it picks arbitrarily which coefficients to eliminate. Ridge, on the other hand, instead of eliminating coefficients, makes them less correlated. Both penalizations introduce bias, so the task is to balance bias vs variance by choosing an optimal penalty parameter.
LASSO is a good choice if you need to reduce the number of Instrumental Variables, since we are not estimating their effects. Otherwise Ridge is a better choice if you are interested in various effects and don’t want them to be dropped from the model. LASSO changes your model and probably not the way you want.

Related Question