[Math] How to take a partial derivative of $\|y – Xw\|^2$ with respect to w

calculuslinear algebranormed-spacespartial derivativevectors

So I've tried to solve this problem where we are asked to solve the partial derivative of function $\sum(y – Xw)^2$ or $\|y-Xw\|^2$ and then minimize it. I've never done any linear algebra aside some really basic stuff and I can't seem to find any information how to take partial derivative of such a function. I know how to take partial derivative of simple function but not functions with $\||x||^2$ notation.

In this case it would probably help to denote some variable e.g $\ z=y-Xw $ that way we get $\|z\|^2 $. Is this even a right approach? I have no clue what to do after this.

Best Answer

An easy way to do it is write it like scalar product and then use the properties of the scalar product. I assume you are working with real matrix and vectors (it really seems that you are dealing with Ordinary Least Squares). \begin{align} \|y-Xw\|^2 &=\langle y-Xw,y-Xw \rangle \\ &=y^Ty-2w^TX^Ty+w^TX^TXw. \end{align} I've just used the fact that the scalar product is bilinear and symmetric. Now just take the derivative and use the product rule: $$\implies -2X^Ty+2X^TXw=0 \\ \implies X^Ty=X^TXw \\ \implies w=(X^TX)^{-1}X^Ty.$$ If you have problem with the derivation in general just write it down the two dimensional case.