[Math] Why would we expect the pushforward to encode the total derivative of a smooth map

differential-geometrymanifolds

According to Lee, the pushforward was invented to give a coordinate independent definition of the total derivative of a smooth function between two smooth manifolds. To each smooth map $F:M \to N$ and each point $p \in M$ we associate a linear map $F_*:T_pM\to T_{F(p)}N$ defined by $F_*X(f) = X(f \circ F)$ where $X$ is any derivation in $T_pM$ and $f:N \to \mathbb{R}$ is any smooth function. Given a smooth chart $\phi:M \to \mathbb{R}^m$ near $p$ we have a basis for $T_pM$ given by $\frac{\partial}{\partial x^i}|_{p} = ({\phi ^{-1}}_*)\frac{\partial}{\partial x^i}|_{\phi(p)}$. Similarly given a smooth chart $\psi$ near $F(p)$ we have a basis for $T_{F(p)}N$ given by $\frac{\partial}{\partial y^i}|_{F(p)} = ({\psi ^{-1}}_*)\frac{\partial}{\partial y^i}|_{\psi(p)}$. A calculation in Lee shows that the matrix representation of $F_*$ with respect to these bases is the total derivative of the coordinate representation $\hat{F} = \psi \circ F \circ \phi ^{-1}$ evaluated at $\phi(p)$.

My question is, is there some intuitive reason why we would expect this to be true? This all seems very abstract to me. I can't tell if it is supposed to be obvious that this definition should be a coordinate independent way of encoding the total derivative of $F$ and I am just missing something, or if it is just difficult to understand. How should I think about the pushforward?

Best Answer

As I understand your question, you want to know why the definition $F_*X(f) := X(f \circ F)$ is an appropriate generalization of the total derivative. In other words, knowing only the definition of the total derivative, how would one come to this definition of pushforward?

Qiaochu's comment is the key: it comes down to the way directional derivatives relate to derivations. Let's flesh out this idea by recalling some multivariable calculus.

Let $F\colon \mathbb{R}^m \to \mathbb{R}^n$ be smooth, and let $D_pF\colon \mathbb{R}^m \to \mathbb{R}^n$ denote the total derivative at $p \in \mathbb{R}^m$. To each vector $w \in \mathbb{R}^n$ (based at $F(p)$), we associate the derivation at $F(p) \in \mathbb{R}^n$ via: $$w \in \mathbb{R}^n \mapsto w^j \left.\frac{\partial}{\partial x^j}\right|_{F(p)}.$$ In particular, for $v \in \mathbb{R}^m$ (based at $p$), $$D_pF(v) \in \mathbb{R}^n \mapsto D_pF(v)^j \left.\frac{\partial}{\partial x^j}\right|_{F(p)}.$$ And in fact, this derivation on the right-hand side is none other than $$\left.v^i\frac{\partial}{\partial x^i}\right|_p(-\circ F).$$

To see this, we just use the chain rule: $$\begin{align*} v^i \left.\frac{\partial}{\partial x^i}\right|_p(-\circ F) & = v^i \left.\frac{\partial F^j}{\partial x^i}\right|_p \left.\frac{\partial}{\partial x^j}\right|_{F(p)} \\ & = v^i D_pF(e_i)^j \left.\frac{\partial}{\partial x^j}\right|_{F(p)} \\ & = D_pF(v)^j \left.\frac{\partial}{\partial x^j}\right|_{F(p)} \end{align*}$$

Alternatively, I believe it also suffices to note that both derivations give the same value when applied to the coordinate function $x^k$: $$D_pF(v)^j\frac{\partial x^k}{\partial x^j} = D_pF(v)^k = v^i D_pF(e_i)^k = \left.v^i\frac{\partial F^k}{\partial x^i}\right|_p = v^i\left.\frac{\partial}{\partial x^i}\right|_p(x^k \circ F).$$

Point: The derivation at $F(p) \in \mathbb{R}^n$ given by $$v^i\partial_i|_{p}(-\circ F)$$ is exactly $$D_pF(v)^j \left.\frac{\partial}{\partial x^j}\right|_{F(p)}$$