[Math] Derivative of dot product of Residual Sum Square in matrix notation

derivativesleast squaresmatrix-calculusregression

I am trying to derive the following expression w.r.t. $\beta$:

\begin{equation}
RSS(\beta) = (\mathbf{y} – \mathbf{X} \beta)^T (\mathbf{y} – \mathbf{X} \beta)
\end{equation}

I know that the derivation of the dot product is equivalent to

\begin{equation}
\frac{d}{dx}(\mathbf{r}(x) \cdot \mathbf{s}(x)) = \mathbf{r'}(x) \cdot \mathbf{s}(x) + \mathbf{r}(x) \cdot \mathbf{s'}(x)
\end{equation}

but I am unable to understand how to differentiate when mixing transposes.

Best Answer

It is easier to expand the brackets first \begin{align} \mbox{RSS}(b) =& (y-Xb)'(y-Xb) = y'y - y'Xb-b'X'y+b'X'Xb\\ =& y'y-2b'X'y+b'X'Xb \end{align} then \begin{align} \frac{\partial}{\partial b}\mbox{RSS}(\hat{b}) = -2X'y+2X'X\hat{b} = 0 \end{align} by rearranging the equation you get $$ X'X\hat{b}=X'y. $$ Assuming no (complete) multicolinearity present, then $\exists \,\,(X'X)^{-1} $, $$ \hat{b}=(X'X)^{-1}X'y, $$ that is the OLS estimator of $b$.