Ridge Regression – How to Derive Ridge Regression Solution

least squaresregressionregularizationridge regression

I am having some issues with the derivation of the solution for ridge regression.

I know the regression solution without the regularization term:

$$\beta = (X^TX)^{-1}X^Ty.$$

But after adding the L2 term $\lambda\|\beta\|_2^2$ to the cost function, how come the solution becomes

$$\beta = (X^TX + \lambda I)^{-1}X^Ty.$$

Best Answer

It suffices to modify the loss function by adding the penalty. In matrix terms, the initial quadratic loss function becomes $$ (Y - X\beta)^{T}(Y-X\beta) + \lambda \beta^T\beta.$$ Deriving with respect to $\beta$ leads to the normal equation $$ X^{T}Y = \left(X^{T}X + \lambda I\right)\beta $$ which leads to the Ridge estimator.

Related Question