Derivation of Linear Regression using Normal Equations

linear regressionmatrix-calculus

I was going through Andrew Ng's course on ML and had a doubt regarding one of the steps while deriving the solution for linear regression using normal equations.

Normal equation: $\theta=(X^TX)^{-1}X^TY$

While deriving, there's this step:

$\frac{\delta}{\delta\theta}\theta^TX^TX\theta = X^TX\frac{\delta}{\delta\theta}\theta^T\theta$

But isn't matrix multiplication commutative, for us to take out $X^TX$ from inside the derivative?

Thanks

Best Answer

Given two symmetric $(A, B)$ consider these following the scalar functions and their gradients $$\eqalign{ \alpha &= \theta^TA\theta &\implies \frac{\partial\alpha}{\partial\theta}=2A\theta \cr \beta &= \theta^TB\theta &\implies \frac{\partial\beta}{\partial\theta}=2B\theta \cr }$$ It's not terribly illuminating, but you can write the second gradient in terms of the first, i.e. $$\frac{\partial\beta}{\partial\theta} = BA^{-1}\frac{\partial\alpha}{\partial\theta}$$ For the purposes of your question, $A=I$ and $B=X^TX$.

Related Question