$\frac{\partial\left( F^{T}F\right)}{\partial F}$ in tensor notation

analysistensors

In index notation, it my calculations are correct the result should be $$\left(\frac{\partial \left(F^{T}F\right)}{\partial F}\right)_{ijkl} = \frac{\partial\left( F_{mi}F_{mj}\right)}{\partial F_{kl}} = F_{mi}\frac{\partial F_{mj}}{\partial F_{kl}} + F_{mj}\frac{\partial F_{mi}}{\partial F_{kl}} = F_{mi}\delta_{mk}\delta_{jl}+F_{mj}\delta_{mk}\delta_{il} = F_{ki}\delta_{jl}+F_{kj}\delta_{il}.$$

I need to add this to some FEM code that uses tensor notation all the way through, so if possible I'd like to have this in tensor notation too, but I have no idea if, or how, this could be written out. It doesn't look like any combination of index notation that I could find in the definitions.

(Just to be clear, by "tensor notation" I mean things like $F\otimes I^T+F^T\otimes I$, for example.)

Best Answer

First, note the two conventions:

$A⊗B = (A_{ik}B_{jl})_{ij,kl} ↭ AXB^𝖳 = (A⊗B)⋅X ↭ \frac{𝐝 AXB^𝖳}{𝐝 X}=A⊗B$
$A⊗B = (A_{jl}B_{ik})_{ij,kl} ↭ AXB^𝖳 = (B⊗A)⋅X ↭ \frac{𝐝 AXB^𝖳}{𝐝 X}=B⊗A$

For example, matrixcalulus.org uses (2). I'll be using (1). So let $F$ be $m×n$, then:

$$\begin{aligned} \frac{𝐝 F'F}{𝐝F} &= \frac{\partial F'F}{\partial (F', F)} ⋅ \frac{\partial(F', F)}{\partial F} \\&= \begin{bmatrix} 𝕀_n⊗F' & F'⊗𝕀_n \end{bmatrix}⋅\begin{bmatrix} 𝕋_{m, n} \\ 𝕀_{m, n} \end{bmatrix} \\&= (𝕀_n⊗F')⋅𝕋_{m, n} + (F'⊗𝕀_n)⋅𝕀_{m, n} \\&= 𝕋_{n, n}⋅(F'⊗𝕀_n) + 𝕀_{n, n}⋅(F'⊗𝕀_n) \\&= (𝕋_{n, n} + 𝕀_{n, n})⋅(F'⊗𝕀_n) \end{aligned}$$

In particular the directional (Gâteaux-) derivative is given by:

$$ 𝐃f(F)⋅H = (𝕋_{n, n} + 𝕀_{n, n})⋅(F'⊗𝕀_n)⋅H = (𝕋_{n, n} + 𝕀_{n, n})⋅F'H = H'F + F'H $$

Which agrees with the direct way of computing it via $\frac{𝖽 f(F+εH)}{𝖽ε}\big|_{ε=0}$.

But let me explain the details step by step:

In this context "$⋅$" generally does not mean matrix multiplication, but appropriate tensor contraction. For the 4D tensors involved here, typically $A⋅B = (∑_{kl} A_{ij, kl}B_{kl, mn})_{ij, mn}$ which is really just regular old matrix multiplication but with multi-indices.
- In particular, $(A⊗B)⋅(C⊗D)=(AC⊗BD)$ when dimensions match.
$𝕀_{m, n} = (δ_{ik}δ_{jl})_{ij, kl} = 𝕀_m ⊗ 𝕀_n$ is the identity tensor of shape $(m×n, m×n)$.
- If $A$ is $m×n$ and $B$ is $m'×n'$ then $𝕀_{m, m'}·(A ⊗ B) = A ⊗ B = (A ⊗ B)·𝕀_{n, n'}$
$𝕋_{m, n} = (δ_{il}δ_{kj})_{ij, kl}$ is the transpose tensor of shape $(n×m, m×n)$.
- It cannot be written as a pure tensor of the form $A⊗B$.
- This follows from $A^𝖳 = ∑_{mn} (e_me_n^𝖳) A (e_me_n^𝖳) = ∑_{mn} (E_{mn} ⊗ E_{nm})⋅A = 𝕋_{m, n}⋅A$
- I.e. $∑_{mn} (E_{mn} ⊗ E_{nm}) = \sum_{mn}(δ_{im}δ_{kn}δ_{nj}δ_{ml})_{ij, kl} = (δ_{il}δ_{kj})_{ij, kl} = 𝕋_{m, n}$
- It satisfies $𝕋_{m, n}^𝖳 = 𝕋_{n, m}$, where $(A⊗B)^𝖳 = (A^𝖳⊗B^𝖳)$ is the transpose (for tensors)
- If $A$ is $m×n$ and $B$ is $m'×n'$ then $𝕋_{m, m'}·(A ⊗ B) = (B⊗A)·𝕋_{n, n'}$

Source Code Demo

In python, using https://github.com/google/jax for automatic jacobian computation.

import jax
import numpy as np

def otimes(A, B):
    """Tensor-product A ⊗ B = (Aᵢₖ·Bⱼₗ)ᵢⱼ,ₖₗ"""
    assert A.ndim==2 and B.ndim==2
    return np.einsum('ij,kl -> ikjl', A, B)



def II(m,n):
    """Identity tensor (𝕀ₘ,ₙ)ᵢⱼ,ₖₗ = (δᵢₖ·δⱼₗ)ᵢⱼ,ₖₗ = (𝕀ₘ ⊗ 𝕀ₙ)ᵢⱼ,ₖₗ"""
    I = np.zeros( (m, n, m, n) )
    for i, j, k, l in np.ndindex(I.shape):
        if i==k and j==l:
            I[i,j,k,l] = 1
    return I

def TT(m,n):
    """Transpose tensor (𝕋ₘ,ₙ)ᵢⱼₖₗ = (δᵢₗ·δₖⱼ)ᵢⱼₖₗ"""
    T = np.zeros( (n, m, m, n) )
    for i, j, k, l in np.ndindex(T.shape):
        if i==l and j==k:
            T[i,j,k,l] = 1
    return T

def f(X):
    return X.T @ X

g = jax.jacfwd(f)

def g_manual_intermediate(X):
    l = np.tensordot(otimes(np.eye(n), X.T), TT(m,n))
    r = np.tensordot(otimes(X.T, np.eye(n)), II(m,n))
    return l+r

def g_manual(X):
    m, n = X.shape    
    return np.tensordot(TT(n,n) + II(n,n), otimes(X.T, np.eye(n)))

m,n = 5,4
X = np.random.randn(5,4)

assert (g(X) == g_manual_intermediate(X)).all()
assert (g(X) == g_manual(X)).all()

Related Solutions

Tensors – Coordinate-Free Notation for Tensor Contraction

I've seen in the literature the notation $C$ with some additional specifications for the contraction maps of all sorts, but the amount of decorations on the symbol $C$ varied depending on the context. See, e.g., A.Gray, Tubes, p.56, where these maps are used in the case of somewhat special tensors, and therefore the notation is simpler.

In general, there is a whole family of uniquely defined maps

$$ C^{(r,s)}_{p,q} \colon \otimes^{r}_{s} V \to \otimes^{r-1}_{s-1} V $$

which are collectively called tensor contractions ($1 \le p \le r, 1 \le q \le s$).

These maps are uniquely characterized by making the following diagrams commutative:

$$ \require{AMScd} \begin{CD} \times^{r}_{s} V @> {P^{(r,s)}_{p,q}} >> \times^{r-1}_{s-1} V\\ @V{\otimes^{r}_{s}}VV @VV{\otimes^{r-1}_{s-1}}V \\ \otimes^{r}_{s} V @>{C^{(r,s)}_{p,q}}>> \otimes^{r-1}_{s-1} V \end{CD} $$

Explanations are in order.

Recall that the tensor products $\otimes^{r}_{s} V$ are equipped with the universal maps $$ \otimes^{r}_{s} \colon \times^{r}_{s} V \to \otimes^{r}_{s} V $$ where $\times^{r}_{s} V := ( \times^r V) \times (\times^s V^*)$.

Besides that, there is a canonical pairing $P$ between a vector space $V$ and its dual: $$ P \colon V \times V^* \to \mathbb{R} \colon (v, \omega) \mapsto \omega(v) $$

Notice that map $P$ is bilinear and can be extended to a family of multilinear maps $$ P^{(r,s)}_{p,q} \colon \times^{r}_{s} V \to \times^{r-1}_{s-1} V $$ by the formula: $$ P^{(r,s)}_{p,q} (v_1, \dots, v_p, \dots, v_r, \omega_1, \dots, \omega_q, \dots, \omega_s) = \omega_q (v_p) (v_1, \dots, \widehat{v_p}, \dots, v_r, \omega_1, \dots, \widehat{\omega_q}, \dots, \omega_s) $$ where a hat means omission.

Since maps $P^{(r,s)}_{p,q}$ are multilinear, the universal property of the maps $\otimes^{r}_{s}$ implies that there are uniquely defined maps $$ \tilde{P}^{(r,s)}_{p,q} \colon \otimes^{r}_{s} V \to \times^{r-1}_{s-1} V $$ and then the maps $C^{(r,s)}_{p,q}$ are given by $$ C^{(r,s)}_{p,q} := \otimes^{r-1}_{s-1} \circ \tilde{P}^{(r,s)}_{p,q} $$

[Math] Multidimensional Tensor Inverse – Index Notation

I was able to answer the question for myself well enough. The answer is that yes, for a given tensor $R_{ijkl}$ It is possible to find another tensor $L_{ijmn}$ with the property that

$$ L_{ijmn}R_{ijkl} = \delta_{mk}\delta_{nl} $$

The point is that $L_{ijmn}$ has $N^4$ components and the equation above stands for $N^4$ equations where $N$ is the length of the indices. We should then be able to solve the above $N^4$ equations for the $N^4$ components of $L_{ijmn}$ to find the desired tensor.

I implemented some code in Mathematica to convince myself which I'll post here for those curious.. I don't know the best way to input code here so apologies if it doesn't look great.

R = Table[RandomReal[], {i, 1, 6}, {j, 1, 6}, {k, 1, 6}, {l, 1, 6}];

L = Table[a[i, j, k, l], {i, 1, 6}, {j, 1, 6}, {k, 1, 6}, {l, 1, 6}];

K = Table[
   KroneckerDelta[m, k]*KroneckerDelta[n, l], {k, 1, 6}, {l, 1, 
    6}, {m, 1, 6}, {n, 1, 6}];

Flatten[
  Table[
   Sum[L[[i, j, m, n]]*R[[i, j, k, l]], {i, 1, 6}, {j, 1, 6}] == 
    K[[k, l, m, n]]
   , {k, 1, 6}, {l, 1, 6}, {m, 1, 6}, {n, 1, 6}]];

s = Solve[Flatten[
    Table[
     Sum[L[[i, j, m, n]]*R[[i, j, k, l]], {i, 1, 6}, {j, 1, 6}] == 
      K[[k, l, m, n]]
     , {k, 1, 6}, {l, 1, 6}, {m, 1, 6}, {n, 1, 6}]]];

Chop[Table[
   Sum[L[[i, j, m, n]]*R[[i, j, k, l]], {i, 1, 6}, {j, 1, 6}]
   , {k, 1, 6}, {l, 1, 6}, {m, 1, 6}, {n, 1, 6}] /. s[[1]]]

The final line shows the result of multiplying the tensors $L$ and $R$ when the components of $L$ are replaced with the components found in the above line. The result is the double Kronecker delta tensor $K$ as desired.

Fortunately for my application the code can run in a few seconds for me (on my laptop) with tensors with $6^4 = 1296$ components.

I'm not sure what to call $L_{ijmn}$. I could call it $R^{-1}_{ijmn}$, but I can also pose another problem:

Find a tensor $G_{ikmj}$ with the property that

$$G_{ikmj}R_{ijkl} = \delta_{ml}$$

This new tensor could similarly be called $R^{-1}_{ikmj}$ and so could a number of other tensors with different combinations and permutations of indices. Therefore, for the time being I'll just continue to use unique names for these inverse tensors and explicitly state the relevant property in terms of the Kronecker deltas.

Best Answer

Source Code Demo

Related Solutions

Tensors – Coordinate-Free Notation for Tensor Contraction

[Math] Multidimensional Tensor Inverse – Index Notation

Related Question