I have looked throughout the matrix cookbook and other sources, but am a bit confused by this problem. If I have a function $F = ABC$, what is the partial derivative of $F$ with respect to $B$? When it comes to matrix functions I could not find a nice answer to this problem. I did find something akin to $dF = CAdX$, but that doesn't seem to make sense as the dimensions of $C$ and $A$ would not match up for multiplication. A could be a $3\times3$, $B$ a $3\times 3$, and $ C$ a $3\times 4$. I'd appreciate help on this. Thanks.

Edit- Well I had changed the problem to be more in line with what I was originally working with below in the comments. Here is the problem:

$f = tr((ABC)(ABC)^T)$ and I want the partial derivative $\frac{\partial}{\partial B} tr((ABC)(ABC)^T)$. I ended up using a modified version of copper.hat's answer. I combined $A$ and $B$ to get something like this $2(ABC)C^T$ for the gradient.

Thank you both for the help. Both answers below are technically correct, I just accepted the shorter and more convenient form.

## Best Answer

When dealing with functions of this sort, I find it easier to deal with the derivative applied to some perturbation.

Let $F(B) = ABC$. Since $F$ is linear, we have $F(B+\Delta) = F(B) + A\Delta C$, so it follows that $DF(B)(\Delta) = A\Delta C$. However, since $DF(B)$ is a map $\mathbb{R}^{n \times n} \to \mathbb{R}^{n \times n}$ (or $\mathbb{C}$, as the case may be), there is no particularly convenient matrix representation.

Note: Each coordinate of $DF(B)$ has a convenient representation. To see, let $\phi_{ij}(B) = e_i^T F(B) e_j = e_i^T ABC e_j$. Then, as above, we have $D\phi_{ij}(B)(\Delta) = e_i^T A\Delta C e_j$. This gives: \begin{eqnarray} D\phi_{ij}(B)(\Delta) &=& e_i^T A \Delta C e_j \\ &=& \operatorname{tr} (e_i^T A \Delta C e_j) \\ &=& \operatorname{tr} (e_j e_i^T A \Delta C ) \\ &=& \operatorname{tr} (C e_j e_i^T A \Delta ) \\ &=& \operatorname{tr} ((A^T e_i e_j^TC^T)^T \Delta ) \\ &=& \langle A^T e_i e_j^TC^T, \Delta \rangle_F \end{eqnarray} Where $\langle X, Y \rangle_F = \operatorname{tr}(X^TY)$ is the inner product induced by the Frobenius norm. With this inner product, the gradient is the given by $\nabla \phi_{ij}(B) = A^T e_i e_j^TC^T $.Another note: If $G(B) = \operatorname{tr} ((ABC)(ABC)^T) $, then you can write $G(B) = \langle ABC, ABC \rangle_F$. Then $G(B+\Delta ) = G(B) + \langle A \Delta C, ABC \rangle_F + \langle ABC, A \Delta C \rangle_F + \langle A \Delta C, A \Delta C \rangle_F$, from which you can see that \begin{eqnarray} DG(B) (\Delta) &=& \langle A \Delta C, ABC \rangle_F + \langle ABC, A \Delta C \rangle_F \\ &=& 2 \langle ABC, A \Delta C \rangle_F \\ &=& 2 \langle A^TABC C^T, \Delta \rangle_F \end{eqnarray} From which it follows that $\nabla G(B) = 2A^TABC C^T$.