Finding $\frac{\partial }{\partial A}\left| \left(A\left(A^{\top} A\right)^{-1} A^{\top} – I \right)b \right|^2$

linear algebramatricesmatrix-calculus

For an $n\times m $ matrix $\mathbf{A}$ and $m\times 1 $ vector $\mathbf{b}$, I'm trying to find the derivative of the following L2-norm
$$
f(\mathbf{A}) = \left| \left(\mathbf{A}\left(\mathbf{A}^{\top} \mathbf{A}\right)^{-1} \mathbf{A}^{\top} – \mathbf{I} \right)\mathbf{b} \right|^2
= \mathbf{b}^{\top} \left(\mathbf{A}\left(\mathbf{A}^{\top} \mathbf{A}\right)^{-1} \mathbf{A}^{\top} – \mathbf{I} \right)^{\top}\left(\mathbf{A}\left(\mathbf{A}^{\top} \mathbf{A}\right)^{-1} \mathbf{A}^{\top} – \mathbf{I} \right)\mathbf{b}
$$

with respect to $\mathbf{A}$, but I'm not sure how to approach larger products like this one.

I'm assuming the chain rule can be applied in some way to help? e.g. if we let
$$
\mathbf{Z} = \mathbf{A}\left(\mathbf{A}^{\top} \mathbf{A}\right)^{-1} \mathbf{A}^{\top} – \mathbf{I}
$$

Then
$$
\begin{equation}
\frac{\partial f}{\partial \mathbf{Z}} = \frac{\partial }{\partial \mathbf{Z}} (\mathbf{Zb})^{\top}(\mathbf{Zb}) = 2\mathbf{Z} \mathbf{bb}^{\top}
\end{equation}
$$

but I'm not sure how to approach finding the derivative of the $\mathbf{A}\left(\mathbf{A}^{\top} \mathbf{A}\right)^{-1} \mathbf{A}^{\top}$ term.

Any help much appreciated!

Best Answer

The hat matrix of $A$ is defined as $$H = A(A^TA)^{-1}A^T$$ This matrix is an orthoprojector, since $\,H^2=H=H^T$

The matrix $P=(I-H)$ is also an orthoprojector,
however, $\;Z=(H-I)$ is not an orthoprojector since $Z^2 = -Z \ne Z$.

The objective function can be written using the orthoprojector
$$\eqalign{ f &= b^T(-P)^T(-P)b \cr&= b^TPb\cr&=bb^T:P\cr&=bb^T:(I-H) }$$ where a colon denotes the trace/Frobenius product, i.e. $\;A:B = {\rm Tr}(A^TB)$

Calculate the differential and gradient of the function.
${\tt [}\,$For convenience, define $B=A^TA.{\tt ]}$ $$\eqalign{ df &= bb^T:(-dH) \cr &= -bb^T:d(AB^{-1}A^T) \cr &= -bb^T:(dA\,B^{-1}A^T+A\,dB^{-1}A^T+AB^{-1}dA^T) \cr &= -bb^T:(2\,dA\,B^{-1}A^T+A\,dB^{-1}A^T) \cr &= -bb^T:(2\,dA\,B^{-1}A^T-AB^{-1}\,dB\,B^{-1}A^T) \cr &= bb^T:AB^{-1}\,dB\,B^{-1}A^T \,-\, 2bb^T:dA\,B^{-1}A^T \cr &= B^{-1}A^Tbb^TAB^{-1}:dB \,-\, 2bb^TAB^{-1}:dA \cr &= B^{-1}A^Tbb^TAB^{-1}:(dA^TA+A^TdA) \,-\, 2bb^TAB^{-1}:dA \cr &= 2AB^{-1}A^Tbb^TAB^{-1}:dA \,-\, 2bb^TAB^{-1}:dA \cr &= 2\big(Hbb^TAB^{-1} - bb^TAB^{-1}\big):dA \cr &= 2\big(H-I\big)\,bb^TA(A^TA)^{-1} : dA \cr \frac{\partial f}{\partial A} &= 2\big(H-I\big)\,bb^TA(A^TA)^{-1} \cr &= 2\,Zbb^TA(A^TA)^{-1} \cr\cr }$$ NB: The cyclic property of the trace allows a Frobenius product to be rearranged in many different ways. For example $$\eqalign{ &A:BC \;&=\; AC^T:B \;=\; B^TA:C \;=\; I:A^TBC \cr &A:X^T \;&=\; A^T:X \cr &A:B \;&=\; B:A \cr }$$