Solved – Differentiating the RSS w.r.t. $\beta$ in Linear Model

matrixmseregressionresiduals

I am reading the book "The Elements of Statistical Learning". The book says
enter image description here

But when I try to prove it, I get the following:

$$RSS(\beta) = (y – X\beta)^T(y-X\beta)$$
$$RSS(\beta) = y^Ty -\beta^TX^Ty -y^TX\beta+\beta^TX^TX\beta$$
$$\frac{\partial{RSS}}{\partial{\beta}} = -y^TX – y^TX + \beta^T(X^TX + X^TX) = -2\beta^T(y^TX +X^TX)$$

What is wrong in my derivation ?

I see the wiki page about the Matrix Calculus. I find I was misled by the meaning about the Numerator-layout notation or Denominator-layout notation. Can anyone give an intuitive explanation when use Numerator-layout notation and when use Denominator-layout notation.

Best Answer

Note that $-y^TX - y^TX + \beta^T(X^TX + X^TX) \ne -2\beta^T(y^TX + X^TX)$. You pulled an extra $\beta^T$ out from the first term. So you were wrong with the algebra there.

You have, $$RSS = y^Ty - \beta^TX^Ty - y^TX \beta + \beta^TX^TX\beta.$$ Notice that $-y^TX \beta$ is a scalar quantity, thus $y^TX\beta = (y^TX \beta)^T = \beta^TX^Ty$. $$\frac{\partial{RSS}}{\partial{\beta}} = -2X^Ty + 2X^TX\beta.$$

\begin{align*} -2X^Ty + 2X^TX \beta & = -2X^T(y - X\beta). \end{align*}

Related Question