Why is regularization used in linear regression

linear algebralinear regressionmachine learningregularization

I already understand that the point of regularization is to penalize (drive down) higher-order parameters for a model thereby increasing its generality. Outside of polynomial regression, I do not understand why regularization would be needed for linear models such as the Tikhonov regularization term in the analytical approach to linear regression:

$$\beta = (X^TX+\lambda I)^{-1}+(X^Ty) $$

Where $I$ refers to a design matrix of dimensions identical to $X$ and $\lambda \in \Re$.

From an intuitive standpoint, I do not understand why regularization is needed if the generality of the model is kept constant by the constraint on the order of the hypothesis (outside of ensuring invertibility). Thanks.

Best Answer

Tikhonov is purely for invertibility, but things like LASSO/Ridge/Elastic-net are for when you want to pick explanatory variables but are worried about over-fitting.

If you are familiar with $R^2$, you know that adding another explanatory variable always increases the $R^2$ of the model. This leads to models that do very well in-sample but give very poor out-of-sample predictions. The LASSO, least-angle regression, random forests, etc. use similar methods to minimize expected (mean-squared) error. This means, you want to throw away explanatory variables that over-fit.

But this gets you back to regularization. Basic regularization is, you have more explanatory variables than observations. The over-fitting problem is, you have enough data to fit a linear model (i.e., solve $(X'X)\beta = X'y$) but you think the resulting model will be too sensitive. Similar tools can help you make good decisions about what explanatory variables are the most useful.