Solved – Why is lasso more robust to outliers compared to ridge

lassooutliersregularizationrobust

In my attempt to reason about it intuitively I am concluding that ridge might be more robust to outliers.

Following is my intuitive/lose reasoning :

If there is an outlier then to match my prediction to it I might increase weight value on some dimension, and when I do that ridge will penalize it more compared to Lasso and not let it take a higher value. So it seems like ridge is more robust, but most people say that Lasso is more robust to outliers.

So my question is that what is wrong in my thought process, and what is the correct intuitive way to think about it ?

Best Answer

Let's first consider what an outlier does to the coefficients:

  • If it has low leverage, nothing;
  • If it has high leverage, it pulls the coefficient towards itself (either increasing or decreasing it).

When you apply the LASSO penalty to OLS, you penalize the coefficients by summing their absolute values. An outlier with sufficient leverage increases/decreases a coefficient, also affecting the penalty linearly. This will somewhat increase/decrease the penalty to the the other coefficients, but not by much.

When you apply the ridge penalty, the sum of squared coefficients shrinks the coefficient. This means that outlyingness will not only increase the OLS quadratically, but also the penalty. As such, all the other coefficients might be shrunk considerably more/less (depending on what kind of outlier you're dealing with).

This sensitivity of the penalty to changes in the coefficients (and thus to outliers) means that ridge is less robust to outliers than LASSO.