[Math] Derivative of trace of pseudo inverse

matricestraces

Given three matrices $A$ (broad), $B$ and $C$, I'd like to find the derivative of

\begin{align}
f = \textrm{tr} \{BA^+\} + \textrm{tr} \{B(A^+)^TCA^+B^T\}
\end{align}

with respect to $A$, where $A^+ = A^T(AA^T)^{-1}$ is the Moore–Penrose pseudo inverse. I know how to compute the derivative of $\frac{\partial \ \textrm{tr} \{ BA^+\}}{\partial A^+} = B^T$ , but cannot figure out a) how to compute the derivative of $A^+$ with respect to $A$ and b) how to use the matrix derivative chain rule to combine these two results.

Best Answer

You should think of $A$ as a time dependent matrix $A=A_t$, denote by $\dot{A}$ the $t$-derivative of $A$, and then think of $f$ as a function of $t$.

To compute the derivative of the inverse of $t$-dependent matrix $B_t$ proceed as follows

$$ 1=B_t B_t^{-1}\Rightarrow 0= \dot{B}_t B_t+B_t\frac{d}{dt}B_t^{-1} $$

so that

$$ \frac{d}{dt}B_t^{-1}=-B_t^{-1}\dot{B}_t B_t. $$

Now let $B_t= A_tA_t^*$. Then $\dot{B}_t=\dot{A}_tA_t^*+A_t\dot{A}_t^*$ and

$$A_t^+=A_t B_t^{-1},$$

$$\dot{A}_t^+= \dot{A}^*_t B_t^{-1}+A_t^*\frac{d}{dt} {B_t}^{-1}=\dot{A}^*_t B_t^{-1}-A_t^* B_t^{-1}\dot{B_t} B_t^{-1}. $$

The Frechet derivative of $A\mapsto A^+$ at the point $A_0$ is the linear map $H\to L(H)$ obtained from the above equality by setting $t=0$ and $\dot{A}_0= H$.

$$ H\mapsto L(H)=H^*B_0^{-1}-A_0^*B_0^{-1}\dot{B}_0 B_0^{-1} =H^*B_0^{-1}-A_0^*B_0^{-1}(HA_0^*+A_0H^*)B_0^{-1}, $$

$$ B_0= A_0A_0^*. $$

This follows from the fact that the Frechet derivative of $A\to A^+$ in the direction $H$ is the derivative at $t=0$ of the path $A_t^+$, $A_t=A_0+tH$.

Update. The $\DeclareMathOperator{\tr}{tr}$ trace defines an inner product onn the space of $n\times n$ matrice via the equality

$$(A,B):=\tr(A^* B). $$

The differential of the function $\tr(BA^+)$ is a linear functional on the space of $n\times n$ matrices given by

$$ H\mapsto L(H) =\tr B \bigl(H^*(A_0A_0^*)^{-1}-A_0^*(HA_0^*+A_0H^*)(A_0A_0^*)^{-1}\bigr). $$

To simplify the presentation let me set $S=(A_0A_0)^{-1}$ so that

$$ L(H)= \tr B(HS-A_0^*HA_0^*S-A_0^*A_0H S) $$

$$= \tr BHS-\tr B A_0^*HA_0^*S-\tr BA_0^*A_0 H S $$

$$ =\tr SB H-\tr A_0^*S B A_0^*H-\tr SBA_0^*A_0 H $$

$$= \tr \underbrace{(SB - A_0^*S B A_0^*- SBA_0^*A_0)}_{=: G} H =(G^*, H). $$

We deduce that the gradient at $A_0$ of the function $A\mapsto \tr(B A^+)$ is the matrix $G^*$.

Related Question