[Math] Matrix Calculus in Least-Square method

derivativesleast squaresmatrix-calculusmultivariable-calculus

In the proof of matrix solution of Least Square Method, I see some matrix calculus, which I have no clue. Can anyone explain to me or recommend me a good link to study this sort of matrix calculus?

In Least-Square method, we want to find such a vector $x$ such that $||Ax-b||$ is minimized.

Assume $r=Ax-b$

$\Rightarrow\|r\|^2=x^TA^TAx-2b^TAx+b^Tb$

$\Rightarrow \nabla_x \|r\|^2=2A^TAx-2A^Tb$

In the end we set the gradient to zero and find the minimized solution. I understand the whole idea, but I just don't know how exactly we did matrix calculus here, or say I don't know how to do the matrix calculus here. For example, can anyone tell me how we got those transpose in $\|r\|^2$(By what rule?) and how we got the gradient?(how do we take the gradient exactly in matrix format)?

I'll really appreciate if you can help me out. Thanks!

Best Answer

Well the first step is the definition of $||r||^2$. This is easy \begin{align} ||r||^2 & = \langle r,r \rangle = r^T r \\ &= (Ax-b)^T(Ax-b) = (x^TA^T-b^T)(Ax-b) \\ &= x^TA^TAx -x^TA^Tb-b^TAx +b^Tb \\ &= x^TA^TAx -(b^TAx)^T -b^TAx +b^Tb \end{align} Since $(b^TAx)$ is a scalar it holds $(b^TAx)⁼ (b^TAx)^T$ Thus \begin{align} ||r||^2 & = x^TA^TAx -2b^TAx +b^Tb \end{align}

And for the derivatives, you could take a look here. Another approach would be to write out the matrix-vector expressions in sumation form and calculate the derivative, then no matrices are involved.

Related Question