Solved – Why does Ridge Regression work well in the presence of multicollinearity

multicollinearityridge regression

I am learning about ridge regression and know that ridge regression tends to work better in the presence of multicollinearity. I am wondering why this is true? Either an intuitive answer or a mathematical one would be satisfying (both types of answers would be even more satisfying).

Also, I know that that $\hat{\beta}$ can always be obtained, but how well does ridge regression work in the presence of exact collinearity (one independent variable is a linear function of another)?

Best Answer

Consider the simple case of 2 predictor variables ($x_1$, $x_2$). If there is no or little colinearity and good spread in both predictors, then we are fitting a plane to the data ($y$ is the 3rd dimension) and there is often a very clear "best" plane. But with colinearity the relationship is really a line through 3 dimensional space with data scattered around it. But the regression routine tries to fit a plane to a line, so there are an infinite number of planes that intersect perfectly with that line, which plane is chosen depends on the influential points in the data, change one of those points just a little and the "best" fitting plane changes quite a bit. What ridge regression does is to pull the chosen plane towards simpler/saner models (bias values towards 0). Think of a rubber band from the origin (0,0,0) to the plane that pulls the plane towards 0 while the data will pull it away for a nice compromise.