Derivative of a matrix with respect to elements of another matrix

linear algebramatricesmatrix equationsmatrix-calculusmultivariable-calculus

I am trying to take the "derivative" of a matrix $M = ABB^TA^T$ with respect to the inner matrix $B$, where $A$ is $n\times n$ and $B$ is $n\times m$ with $m \leq n$.

In other words, looking for a closed form solution to $\frac{\partial M_{ab}}{\partial B_{ij}}$ for all $1\leq a,b,i \leq n$ and $1\leq j \leq m$

Eventually, the goal would be to use this to find the derivative of the sum of all elements of $M$ with respect to the elements of $B$. So if $\mathbf{1}$ is the $n\times 1$ vector of $1$'s, then let:

$f(B) = \sum_{a=1}^n \sum_{b=1}^n M_{ab} = \mathbf{1}^T A B B^T A^T \mathbf{1}$

And my end goal is to find $\frac{\partial f(B)}{\partial B_{ij}}$. Of course, since the derivative of a sum is the sum of derivatives, I should be able to find the full solution from a closed-form expression for $\frac{\partial M_{ab}}{\partial B_{ij}}$

This seems like it should have a simple solution but I have been stuck on it for a while. Any help is welcome and appreciated.

Best Answer

Use a colon to denote the trace/Frobenius product, i.e. $$A:B = {\rm Tr}(A^TB) = {\rm Tr}(AB^T) = B:A$$ and use it to rewrite the cost function. Then calculate its differential and gradient. $$\eqalign{ f &= {\rm Tr}({\tt1}^TABB^TA^T{\tt1}) \\ &= {\rm Tr}(A^T{\tt1}{\tt1}^TABB^T) \\ &= A^T{\tt1}{\tt1}^TA:BB^T \\ df &= A^T{\tt1}{\tt1}^TA:(dB\,B^T+B\,dB^T) \\ &= 2A^T{\tt1}{\tt1}^TA:dB\,B^T \\ &= 2A^T{\tt1}{\tt1}^TAB:dB \\ \frac{\partial f}{\partial B} &= 2A^T{\tt1}{\tt1}^TAB \\ }$$ NB:   The properties of the trace allow terms in a colon product to be shuffled around, e.g. $$A:BC \;=\; AC^T:B \;=\; B^TA:C \;=\; etc.$$ Also recall that the trace of a scalar quantity is equal to the quantity itself.