Is this the correct equation for derivative of a matrix w.r.t. a matrix

matricesmatrix-calculustensors

I've been trying to figure out how to do derivatives on tensors (interpreted as multi-dimensional arrays), but the only resources I've found require a lot of development of differential geometry, so I'm not sure I understand it correctly, hence this question.

Write the matrix multiplication $C=AB$ in tensor notation (Einstein notation/Ricci notation?) as $C^i{}_j = A^i{}_kB^k{}_j$, where these are real-valued matrices. Now write the derivative $\frac {\partial C(A,B)}{\partial A}$ as $$\frac {\partial C^i{}_j}{\partial A^p{}_q}=\delta^i{}_p B^q{}_j.$$ Is this correct?

Best Answer

$ \def\a{\alpha}\def\b{\beta}\def\g{\gamma}\def\t{\theta} \def\l{\lambda}\def\s{\sigma}\def\e{\varepsilon} \def\o{{\tt1}}\def\p{\partial} \def\A{{\cal A}}\def\B{{\cal B}}\def\C{{\cal C}} \def\E{{\cal E}}\def\F{{\cal F}}\def\G{{\cal G}} \def\L{\left}\def\R{\right}\def\LR#1{\L(#1\R)} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} $Define the single and double contraction products between tensors as $$\eqalign{ \F &= \A\cdot\B ​\quad&\implies\quad ​&F_{ij\ell ps}&=\sum_{k=\o}^n \A_{ij\c{k}}\B_{\c{k}\ell ps} \\ \G &= \A:\B \quad&\implies\quad &\G_{i\ell ps}&=\sum_{j=\o}^m\sum_{k=\o}^n \A_{i\c{jk}}\B_{\c{jk}\ell ps} \\ }$$ Now consider a fourth-order tensor whose components (in terms of Kronecker delta symbols) are $$\E_{ijk\ell} = \delta_{ik}\delta_{j\ell}$$ This tensor is the identity with respect to the double contraction product. Further, it can be used to rearrange ordinary matrix products, i.e. $$\eqalign{ &A = \E:A = A:\E \\ &A\cdot B\cdot C = \LR{A\cdot\E\cdot C^T}:B \\ }$$ Applying these ideas to the product in question yields $$\eqalign{ C &= A\cdot B \\ dC &= dA\cdot B = \LR{\E\cdot B^T}:dA \\ \grad{C}{A} &= \E\cdot B^T \\ }$$ In component form, this becomes $$\eqalign{ \grad{C_{ij}}{A_{k\ell}} = \sum_{p=\o}^n \E_{ijk\c{p}} B_{\c{p}\ell}^T = \sum_{p=\o}^n \delta_{ik}\delta_{j\c{p}} B_{\ell\c{p}} = \delta_{ik} B_{\ell{j}} \\\\ }$$


Note that if you are working in a flat space (which is the case for most engineering and business uses of multidimensional arrays), there is no need to distinguish between covariant/contravariant components.

Therefore you can use a simplified notation wherein all indices are written as subscripts. And the Einstein convention applies to any repeated subscript, e.g. $$ A_{ij}B_{jk} \quad\implies\quad \sum_{j=\o}^n A_{i\c{j}}B_{\c{j}k} $$

Related Question