Solved – Why are solution to ridge regression always expressed using matrix notation

regressionridge regression

Consider the following ridge regression problem: minimize the loss function $\sum_{i=1}^n ||y_i – w^T x_i||_2^2 + \lambda ||w||_2^2$ with respect to the weight vector w. Taking derivative with respect to w, I get $\sum_{i=1}^n 2(y_i – w^T x_i)(-x_i) + 2\lambda w$ which implies $w =(\sum_{i=1}^n (y_i – w^T x_i)(x_i)) / 2\lambda $. Is this wrong? I know that the solution is $(X^TX – \lambda I)^{-1}X^Ty$.

Best Answer

Your derivative is okay. Just remember to put all the $w$-terms on the same side of the equation $$\eqalign{ \sum_i x_i y_i &= \lambda w + \sum_i x_i x_i^Tw \cr }$$ Then pull $w$ out of the summation, since it's independent of $i$ $$\eqalign{ \sum_i y_i x_i &= \Big(\lambda I + \sum_i x_ix_i^T\Big)w \cr }$$ At this point, dispose of the summations in favor of matrix notation $$\eqalign{ X^Ty &= \big(\lambda I + X^TX\big)w \cr }$$ where $x_i$ is the $i^{th}$ column of $X,\,$ and $\,y_i$ is the $i^{th}$ component of $y$.