Solved – why ridge regression only decreases slope and not increases it

biasintuitionmachine learningridge regression

I was following the below example from 'StatQuest with Josh Starmer' youtube channel.

The example is pretty simple: red line is the usual 'least squares' (for the red points), and the blue one is ridge regression line (for the red points); where we sacrifice a bit of error in the test data, but it will fit better all data (green +red dots).

I do understand the above, and it makes sense; but what if the all the real data ends up being above the line? Why ridge regression only assumes that all the remaining data could be better fit with a smaller slope and not a larger slope?

enter image description here

Best Answer

Sizes of the slopes can actually increase with ridge regression. That is because with multiple predictor variables, reducing the norm of the coefficient vector can sometimes be done more effectively if one or some (but clearly not all) of its components is allowed to increase. With simple linear regression (assuming the intercept is not penalized, as is usual) this cannot occur.

One way of seeing this occur is plotting the coefficient paths when the penalization parameter is increased, and some examples of such plot can be seen in this post: Coefficients paths – comparison of ridge, lasso and elastic net regression

Note that in that plot, you can see one trace first increasing, and then ultimately decreasing. That is quite typical.

Related Question