Make sense of this definition of the trace of a tensor

riemannian-geometrytensors

I'm reading Lee's "Riemannian manifolds: an introduction to curvature". I'm having trouble understanding his definition of the trace (or contraction) operator.

Here's his definition:
enter image description here

and here's Lemma 2.1:
enter image description here

In his notation, $T^k_l (V)$ is a mixed $k \choose l$-tensor on a finite-dim vector space.

I think I understand 2.1: $T^k_{l+1}$ is the space of tensors that take $k$ vectors and $(l+1)$ covectors as arguments and linearly maps them all to a real number. But if we instead had only $k$ vectors and $l$ covectors as arguments, we could "partially apply" $T^k_{l+1}$ to them, so that if we get one more covector argument, it will now return a real number. But this is like saying that $T^k_{l+1}$ is a function of $k$ vectors and $l$ covectors as arguments, that returns a mapping $V^\ast \rightarrow \mathbb R$. And something that takes a covector argument and acts on it to produce a real number is a vector.

Assuming that's right (please correct me if I'm wrong), I still don't see what exactly the trace he describes is doing.

My best guess is that since Lemma 2.1 says $T^k_{l+1}$ can be viewed as a map from $k$ vectors and $l$ covectors to $V$, we could maybe apply it similarly to the product of that, but this time doing something more like turning $T^{k+1}_l$ into a map from $k$ vectors and $l$ covectors to $V^\ast$.

I'm especially confused by the sentence starting "More generally, …". When I read it, it seems like he's defining $\mathrm{tr}\ F$ in terms of "the trace of the endomorphism…", but how do we define that trace in the first place?

Best Answer

First step: we need to define the trace of an endomorphism $f:V\to V$ where $\dim V=n$. One way is to take a basis $\beta=\{e_1,\dots, e_n\}$ of $V$, consider the associated matrix $[f]_{\beta}$, and define \begin{align} \text{trace}(f):=\text{trace}([f]_{\beta}):=\sum_{i=1}^n([f]_{\beta})_{ii}, \end{align} i.e the sum of the diagonals of the matrix-representation of $f$. This result doesn't depend on the choice of basis (if you use a different basis $\gamma$, then $[f]_{\gamma}=P[f]_{\beta}P^{-1}$ for some invertible matrix $P$; i.e they're related by similarity, and now using the cyclic property of traces ($\text{trace}(AB)=\text{trace}(BA)$) the well-definition follows). Using the isomorphism $\text{End}(V)\cong T^1_1(V)$, we see that $\text{trace}:\text{End}(V)\to\Bbb{R}$ induces a mapping (which by slight abuse of language we still refer to as 'trace') $\text{trace}:T^1_1(V)\to\Bbb{R}$. If you carry out this isomorphism, you'll see that it amounts to taking a basis $\{e_1,\dots, e_n\}$ of $V$, the dual basis $\{\epsilon^1,\dots, \epsilon^n\}$ of $V^*$, and then the trace of a $(1,1)$ tensor $F$ is \begin{align} \text{trace}(F)&=\sum_{i=1}^nF(\epsilon^i,e_i). \end{align}

So, in the above paragraph, we've defined the trace of a $(1,1)$ tensor. A natural question arises as to whether we can define an analogous operation for higher order tensors. Let $F$ be a $(k+1,l+1)$ tensor in your notation, where $k,l\geq 0$. This means $F$ is a multilinear map $(V^*)^{l+1}\times V^{k+1}\to\Bbb{R}$. Let us fix two integers $i$ and $j$ such that $1\leq i\leq l+1$ and $1\leq j\leq k+1$. We can now define a map $C_{ij}:T^{k+1}_{l+1}(V)\to T^k_l(V)$, which we shall call the '$i,j$ contraction map', whose definition is: you take $\omega^1,\dots, \omega^l\in V^*$ and $v_1,\dots, v_k\in V$, and define $C_{ij}(F)\in T^k_l(V)$ such that its numerical value on these guys is \begin{align} \text{trace}\bigg( F(\omega^1,\dots, \omega^{i-1}, \underbrace{\star}_{\text{$i^{th}$ covector slot}}, \omega^i,\dots, \omega^l, v_1,\dots, v_{j-1},\underbrace{\star}_{\text{$j^{th}$ vector slot}},v_j,\cdots, v_k)\bigg). \end{align} In words, we take $k$ vectors $v_1,\dots, v_k$ and $l$ covectors $\omega^1,\dots, \omega^l$, and we feed it inside of $F$ (which has $l+1$ open slots for covectors and $k+1$ open slots for vectors) such that we leave the $i^{th}$ covector slot empty, and the $j^{th}$ vector slot empty. With these two slots left open, we now have a $(1,1)$ tensor, so by my first paragraph, you can take the trace and get a number.

So that's the definition. Here I've given this mapping the name $C_{ij}$ to mean the '$i,j$ contraction', but it's also common to call it $\text{tr}_{ij}$ to mean the trace over the $i,j$ slots. Often, we may dispense with notation like $C_{ij}$ or $\text{tr}_{ij}$, and simply say in words "take the trace/contraction of the tensor $F$ over its $i^{th}$ covector and $j^{th}$ vector slots".

For concreteness, lets say $F$ is a $(3,2)$ tensor, meaning a multilinear map $F:V^*\times V^*\times V\times V\times V\to\Bbb{R}$. And say I want to take the trace over the first covector slot and the second vector slot (i.e $C_{12}$ or $\text{tr}_{12}$). Then, $\text{tr}_{12}(F):V^*\times V\times V\to\Bbb{R}$ is the map such that for all $\omega\in V^*,u,v\in V$, \begin{align} (\text{tr}_{12}F)(\omega, u,v)&:=\text{trace}\bigg(F(\star,\omega, u,\star, v)\bigg)=\sum_{i=1}^nF(\epsilon^i,\omega,u,e_i,v) \end{align}


For a slightly more abstract perspective on traces, see this answer of mine. The point is we can take any number of vector spaces $V_1,\dots, V_p$, and form the tensor product space $V_1\otimes\cdots\otimes V_p$. As long as we have one copy of $V$ and one copy of $V^*$ in the tensor product (i.e there exist distinct indices $i,j\in\{1,\dots, p\}$ such that $V_i=V_j^*$), we can define a trace/contraction mapping over those spaces, thereby obtaining a linear map $V_1\otimes\cdots\otimes V_p\to V_1\otimes\cdots\widehat{V_i}\otimes\cdots\otimes \widehat{V_j}\otimes\cdots\otimes V_p$, where the hat means omit that space in the tensor product.

We can generalize this idea further. Suppose $V_1,\dots, V_p$ are any vector spaces. Suppose we fix distinct indices $i,j$, and that we have a bilinear map $\mu:V_i\times V_j\to\Bbb{R}$. Then, we can define a 'contraction with respect to $\mu$' to be the unique linear map $\tilde{\mu}:V_1\otimes\cdots\otimes V_p\to V_1\otimes\cdots\widehat{V_i}\otimes\cdots\otimes \widehat{V_j}\otimes\cdots\otimes V_p$ such that for all pure tensors, we have \begin{align} \tilde{\mu}(v_1\otimes\cdots\otimes v_p)&=\mu(v_i,v_j)\cdot v_1\otimes\cdots \otimes\widehat{v_i}\otimes\cdots\otimes\widehat{v_j}\otimes\cdots\otimes v_p. \end{align} The previous paragraph was the special case where $\mu:V\times V^*\to\Bbb{R}$ is the evaluation mapping on a pair of vector spaces.