$\frac{\partial\left( F^{T}F\right)}{\partial F}$ in tensor notation

analysistensors

In index notation, it my calculations are correct the result should be $$\left(\frac{\partial \left(F^{T}F\right)}{\partial F}\right)_{ijkl} = \frac{\partial\left( F_{mi}F_{mj}\right)}{\partial F_{kl}} = F_{mi}\frac{\partial F_{mj}}{\partial F_{kl}} + F_{mj}\frac{\partial F_{mi}}{\partial F_{kl}} = F_{mi}\delta_{mk}\delta_{jl}+F_{mj}\delta_{mk}\delta_{il} = F_{ki}\delta_{jl}+F_{kj}\delta_{il}.$$

I need to add this to some FEM code that uses tensor notation all the way through, so if possible I'd like to have this in tensor notation too, but I have no idea if, or how, this could be written out. It doesn't look like any combination of index notation that I could find in the definitions.

(Just to be clear, by "tensor notation" I mean things like $F\otimes I^T+F^T\otimes I$, for example.)

Best Answer

First, note the two conventions:

  1. $AβŠ—B = (A_{ik}B_{jl})_{ij,kl} β†­ AXB^𝖳 = (AβŠ—B)β‹…X β†­ \frac{𝐝 AXB^𝖳}{𝐝 X}=AβŠ—B$
  2. $AβŠ—B = (A_{jl}B_{ik})_{ij,kl} β†­ AXB^𝖳 = (BβŠ—A)β‹…X β†­ \frac{𝐝 AXB^𝖳}{𝐝 X}=BβŠ—A$

For example, matrixcalulus.org uses (2). I'll be using (1). So let $F$ be $mΓ—n$, then:

$$\begin{aligned} \frac{𝐝 F'F}{𝐝F} &= \frac{\partial F'F}{\partial (F', F)} β‹… \frac{\partial(F', F)}{\partial F} \\&= \begin{bmatrix} 𝕀_nβŠ—F' & F'βŠ—π•€_n \end{bmatrix}β‹…\begin{bmatrix} 𝕋_{m, n} \\ 𝕀_{m, n} \end{bmatrix} \\&= (𝕀_nβŠ—F')⋅𝕋_{m, n} + (F'βŠ—π•€_n)⋅𝕀_{m, n} \\&= 𝕋_{n, n}β‹…(F'βŠ—π•€_n) + 𝕀_{n, n}β‹…(F'βŠ—π•€_n) \\&= (𝕋_{n, n} + 𝕀_{n, n})β‹…(F'βŠ—π•€_n) \end{aligned}$$

In particular the directional (GΓ’teaux-) derivative is given by:

$$ 𝐃f(F)β‹…H = (𝕋_{n, n} + 𝕀_{n, n})β‹…(F'βŠ—π•€_n)β‹…H = (𝕋_{n, n} + 𝕀_{n, n})β‹…F'H = H'F + F'H $$

Which agrees with the direct way of computing it via $\frac{𝖽 f(F+Ξ΅H)}{𝖽Ρ}\big|_{Ξ΅=0}$.

But let me explain the details step by step:

  1. In this context "$β‹…$" generally does not mean matrix multiplication, but appropriate tensor contraction. For the 4D tensors involved here, typically $Aβ‹…B = (βˆ‘_{kl} A_{ij, kl}B_{kl, mn})_{ij, mn}$ which is really just regular old matrix multiplication but with multi-indices.
    • In particular, $(AβŠ—B)β‹…(CβŠ—D)=(ACβŠ—BD)$ when dimensions match.
  2. $𝕀_{m, n} = (Ξ΄_{ik}Ξ΄_{jl})_{ij, kl} = 𝕀_m βŠ— 𝕀_n$ is the identity tensor of shape $(mΓ—n, mΓ—n)$.
    • If $A$ is $mΓ—n$ and $B$ is $m'Γ—n'$ then $𝕀_{m, m'}Β·(A βŠ— B) = A βŠ— B = (A βŠ— B)·𝕀_{n, n'}$
  3. $𝕋_{m, n} = (Ξ΄_{il}Ξ΄_{kj})_{ij, kl}$ is the transpose tensor of shape $(nΓ—m, mΓ—n)$.
    • It cannot be written as a pure tensor of the form $AβŠ—B$.
    • This follows from $A^𝖳 = βˆ‘_{mn} (e_me_n^𝖳) A (e_me_n^𝖳) = βˆ‘_{mn} (E_{mn} βŠ— E_{nm})β‹…A = 𝕋_{m, n}β‹…A$
    • I.e. $βˆ‘_{mn} (E_{mn} βŠ— E_{nm}) = \sum_{mn}(Ξ΄_{im}Ξ΄_{kn}Ξ΄_{nj}Ξ΄_{ml})_{ij, kl} = (Ξ΄_{il}Ξ΄_{kj})_{ij, kl} = 𝕋_{m, n}$
    • It satisfies $𝕋_{m, n}^𝖳 = 𝕋_{n, m}$, where $(AβŠ—B)^𝖳 = (A^π–³βŠ—B^𝖳)$ is the transpose (for tensors)
    • If $A$ is $mΓ—n$ and $B$ is $m'Γ—n'$ then $𝕋_{m, m'}Β·(A βŠ— B) = (BβŠ—A)·𝕋_{n, n'}$

Source Code Demo

In python, using https://github.com/google/jax for automatic jacobian computation.

import jax
import numpy as np

def otimes(A, B):
    """Tensor-product A βŠ— B = (Aα΅’β‚–Β·Bβ±Όβ‚—)α΅’β±Ό,β‚–β‚—"""
    assert A.ndim==2 and B.ndim==2
    return np.einsum('ij,kl -> ikjl', A, B)



def II(m,n):
    """Identity tensor (π•€β‚˜,β‚™)α΅’β±Ό,β‚–β‚— = (Ξ΄α΅’β‚–Β·Ξ΄β±Όβ‚—)α΅’β±Ό,β‚–β‚— = (π•€β‚˜ βŠ— 𝕀ₙ)α΅’β±Ό,β‚–β‚—"""
    I = np.zeros( (m, n, m, n) )
    for i, j, k, l in np.ndindex(I.shape):
        if i==k and j==l:
            I[i,j,k,l] = 1
    return I

def TT(m,n):
    """Transpose tensor (π•‹β‚˜,β‚™)α΅’β±Όβ‚–β‚— = (Ξ΄α΅’β‚—Β·Ξ΄β‚–β±Ό)α΅’β±Όβ‚–β‚—"""
    T = np.zeros( (n, m, m, n) )
    for i, j, k, l in np.ndindex(T.shape):
        if i==l and j==k:
            T[i,j,k,l] = 1
    return T

def f(X):
    return X.T @ X

g = jax.jacfwd(f)

def g_manual_intermediate(X):
    l = np.tensordot(otimes(np.eye(n), X.T), TT(m,n))
    r = np.tensordot(otimes(X.T, np.eye(n)), II(m,n))
    return l+r

def g_manual(X):
    m, n = X.shape    
    return np.tensordot(TT(n,n) + II(n,n), otimes(X.T, np.eye(n)))

m,n = 5,4
X = np.random.randn(5,4)

assert (g(X) == g_manual_intermediate(X)).all()
assert (g(X) == g_manual(X)).all()
Related Question