Rewriting the Ridge Regression coefficients

machine learningregressionridge regressionweights

In Ridge Regression we try to find the minimum of the following loss function:

$$\text{min}_w\mathcal{L}_{\lambda}(w,S)=\text{min}\lambda\|w\|^2+\sum^l_{i=1}(y_i-g(x_i))^2$$

Where:

  • $\lambda$ is a positive number that defines the relative trade-off betweeen norm and loss
  • $\mathcal{L}$ is the loss function
  • $w\in\mathbb{R}^n$ is the vector of weights
  • $g(x_i)$ is the predicted value of observation $x_i$

Taking the derivative of the cost function with respect to the parameters we obtain the equations (*)

$$X'Xw+\lambda w=(X'X+\lambda I_n)w=X'y$$

Where:

  • $I_n$ is the $n\times n$ identity matrix
  • $X\in \mathbb{R}^{l\times n}$ is the data matrix
  • $X'$ is the transpose of $X$

The solution to the above equation is

$$w=(X'X+\lambda I_n)^{-1}X'y$$

Now, my book says that we can rewrite equations (*) in terms of $w$:

$$w=\lambda^{-1}X'(y-Xw)=X'\alpha$$

showing that $w$ can be written as a linear combination of the training points $w=\sum^l_{i=1}\alpha_ix_i$ with $\alpha=\lambda^{-1}(y-Xw)$

I have a hard time understanding how is $w=\lambda^{-1}X'(y-Xw)$ derived. Can someone show this algebraically?

Best Answer

Just:

$X'y = X'Xw + \lambda w $

$X'y - X'Xw = \lambda w $

$X'(y - Xw) = \lambda w $

$w = \lambda^{-1}X'(y - Xw) $

$w = X'\alpha $ with $\alpha=\lambda^{-1}(y - Xw) $

Related Question