Why is the abstract functorial definition of the tangent bundle not widely accepted

definitiondifferential-geometrysoft-questiontangent-bundle

The following quote from page 595 of Spivak's Calculus exemplifies my viewpoint on definitions:

It is an important part of a mathematical education to follow a construction of the real numbers in detail, but it is not necessary to refer ever again to this particular construction. It is utterly irrelevant that a real number happens to be a collection of rational numbers, and such a fact should never enter the proof of any important theorem about the real numbers. Reasonable proofs should use only the fact that the real numbers are a complete ordered field, because this property of the real numbers characterizes them up to isomorphism, and any significant mathematical property of the real numbers will be true for all isomorphic fields.

This makes a lot of sense to me! This line of thought is very much in the spirit of mathematical abstraction, and is why, for example, we define vector spaces in terms of their operations rather than geometric properties.

Now, onto the question. Recently I've been learning about the tangent bundle, and in my view the best definition of it that I can see is as follows:

There exists a unique functor $T$ (up to natural equivalence) from the category of differentiable manifolds to the category of vector bundles, such that its restriction to Euclidean spaces is naturally equivalent to the trivialising functor (which sends $\mathbb R^n$ to its trivial bundle, and which sends a smooth map to its derivative), and its restriction to open submanifolds is naturally equivalent to the restriction of the functor. (This definition is given and proved in Chapter 3 of Spivak's Intro to Differential Geometry.) The image of $M$ under $T$ is the tangent bundle of $M$.

In my opinion, this definition is the most natural one for the tangent bundle. It emphasises the locally Euclidean nature of manifolds, as well as the idea that the pushforward is essentially the derivative of a given map. However, I have not seen many people give or endorse this definition. Most people define tangent vectors as equivalence classes of curves, or as point derivations. These definitions feel a lot less fundamental, like the definitions of real numbers involving Dedekind cuts or equivalence classes of Cauchy sequences. Everything you need to know about the tangent bundle can be inferred from the functorial definition, and in a more geometric way, so I can't see why the other definitions are still uncontested (at most, they allow for easy proofs that the tangent bundle functor $T$ actually exists).

Thus, my question is the following: why isn't the functorial definition more commonly used? What is it about the curve-based or derivation-based definitions that saves them from being "utterly irrelevant"?

Best Answer

I’ll try to address this question in a few ways, and hopefully some points may be convincing. I think the most obvious and direct answer to why this definition is not common is simply because it is pretty darn abstract, as I mentioned in my first comment, so starting from scratch, one will have to spend a half/full semester even understanding what the heck is being said (functor, category, vector bundle), so one can forget about even talking of other important concepts in differential geometry. So, pedagogically speaking this is of course a terrible idea for 95% of the people. And many textbooks are written (or atleast claim to be) for the sake of teaching so it is no surprise that such an exposition is not ‘common’. This is why in Spivak’s differential geometry text, he doesn’t open with that definition, instead he starts with nice arrows in $\Bbb{R}^n$, and leaves the abstract characterization to the addendum.

I think the above answer of lack of pedagogy is sufficient, so I’ll now try to answer from a broader perspective. I’ll present two main reasons, with many examples from other topics, for why it’s better to not simply give ‘abstract definitions’, and hence why thinking of curves and derivations is a good thing (and even the ‘physicist’ definition of transformation law under changes of chart) is a good thing, and also necessary for other problems.

1. Concrete things can give easier proofs.

Ok, let us set aside pedagogy for now (even though it’s a huge deal). Obviously it’s true that much of the ‘standard’ theorems don’t really rely on any particular implementation of the tangent space, and simply rely on the functorial property (leading back to the chain rule of analysis). However, with concrete implementations, certain theorems are more obvious and can be easily proved. This should be clear even before we mention category theory and the accompanying universal definitions. For example:

Even with Spivak’s quote, “such a fact should never enter the proof of any important theorem about the real numbers”, I would say that there are things which can be proved almost immediately if we do use such facts. For example, defining $\Bbb{R}$ as the Cauchy-completion of $\Bbb{Q}$, one can immediately see that there is an isomorphic copy of the rationals sitting inside the reals. On the other hand, if all we use is that the reals are a complete ordered field, then we’re going to have to work a little harder, taking a detour into some algebra (eg here). Granted, this is not difficult, and indeed we see that the completeness aspect of $\Bbb{R}$ is not needed, but still without the algebraic background, one can immediately deduce a theorem about $\Bbb{R}$ simply based on how it was constructed.
In linear algebra, sometimes one can give a quick and easy proof using bases and matrices, while the same thing using ‘abstract’ methods can take longer. It is good to know both of course, but there is something to be said for the speed. For example, proving that a matrix and its transpose have the same rank can be somewhat straightforwardly proven using some row reduction type arguments, and that is pretty quick. One can also go the abstract route into dual spaces, annihilator subspaces and the first isomorphism theorem to prove that if $T:V\to W$ is a linear map between finite-dimensional spaces then $\ker(T^*)=[\text{image}(T)]^0\cong(W/\text{image}(T))^*\cong W/\text{image}(T)$ and hence by rank-nullity/first-isomorphism theorem $\text{rank}(T^*)=\text{rank}(T)$ (I admit, I prefer the abstract approach here, but I can still appreciate the matrix approach).
One can define Banach spaces $L^p(\Bbb{R}^n)$ without ever mentioning the name Lebesgue. Simply define it to be the completion of the vector space $C_c(\Bbb{R}^n)$ with the $p$-norm using only the Riemann integral. We know such a completion exists and is unique up to isometric isomorphism (thank you abstraction:) However, I’m sure many people will agree that without developing the corresponding measure theory and Lebesgue integration theory (particularly the 3 fundamental convergence theorems), the study of the $L^p$ spaces is pretty limited. So, to speak more poetically, sometimes the journey is more important than the destination. Particularly in analysis, where concrete estimates are important (I say this in spite of how much I like abstract functional analysis).
In a similar spirit, consider the Sobolev spaces $W^{k,p}(\Bbb{R}^n)$; say $1<p<\infty$ to avoid technicalities. One can define this space abstractly again as the completion of $C^{\infty}_c(\Bbb{R}^n)$ with the corresponding Sobolev norm. One can prove that there is a unique continuous linear injection $\iota:W^{k,p}(\Bbb{R}^n)\to L^p(\Bbb{R}^n)$ such that for all $f\in C^{\infty}_c(\Bbb{R}^n)$, $\iota(f)= f$ (or rather $[f]$ where $[f]$ is the equivalence class of $f$). In other words, there is an isomorphic copy of the Sobolev space inside the Lebesgue space. However, if one actually goes through the motions, proving injectivity of the map $\iota$ abstractly from first principles is rather tricky. The link only proves it for $p=2$, but one can do it for other $p$ as well, using that the differential operators $\partial^{\alpha}$ acting on $C^{\infty}_c(\Bbb{R}^n)$ are closable since their adjoints are densely defined, say atleast on $C^{\infty}_c(\Bbb{R}^n)$. But as you can see from this explanation, this goes into a lot of abstract functional analysis requiring the theory of unbounded operators and their adjoints, and their graphs. On the other hand, in a similar spirit to my previous bullet point about Lebesgue spaces, one can give a ‘concrete’ definition of $W^{k,p}(\Bbb{R}^n)$ using weak derivatives, and with this definition, it is almost automatic that $W^{k,p}(\Bbb{R}^n)$ is a Banach space and by definition it is contained in $L^p(\Bbb{R}^n)$ (ok to show this coincides with the previous definition, one has to show $C^{\infty}_c$ is dense in $W^{k,p}$, but this is easily accomplished by the well-known approximation arguments using convolutions). I know this is a complicated example regardless, but my point is that with the concrete definitions, certain properties become almost trivial.

2. Having many concrete solutions give rise to different ideas of generalization and/or solving new problems.

This again comes back to the old adage: the journey is (or can be) just as (or even more) important than the destination, and also a mathematical saying: a good definition should be the hypothesis of a theorem.

For example in abstract algebra, one has the wonderful definition of tensor product $V\otimes W$ of vector spaces (or more generally, modules). But really how does one even come up with such universal properties? Well, they have some concrete implementation in mind, and they like some properties of it, and with some trial and error, they reproduce an abstract definition which agrees with the concrete one. Ok, so now let’s change the setting. Say I want to consider a tensor product of two Hilbert spaces. Can you immediately give me an abstract universal definition for it? I highly doubt it. You may try to write down a candidate guess and perhaps start with the algebraic definition, and then you would try some natural looking inner products and maybe afterwards take the completion etc. But trying to give an immediately universal definition is challenging! One needs to have some idea of what one is aiming for before formulating a general definition. This sort of goes back to pedagogy even though I said I wanted to avoid repeating it: we are humans so we learn inductively, not deductively, so when the final abstract version of a definition seems so far removed from the initial question, it’s a sign that it’s not a good to start from there alone (even if it completely characterizes everything).
For a more differential geometry-esque POV, consider the remark made by @Brevan Ellefsen about working with more general spaces. Once you lower the regularity of things, things just get more complicated. You can use the curves definition, but not derivations to define tangent bundles, so if for instance when proving the functorial result in $C^{\infty}$ you used derivations, then you’d have no idea how to proceed in the $C^1$ case (and mind you, for analysis, $C^1$ is already a luxury). Also, sometimes, rather than trying to classify everything up to diffeomorphism/isomorphism and making an intrinsic definition, it is much more convenient (and I’d say even essential) to work with things more concretely. This is especially true when trying to “classify singularities” of spaces. The easiest example I can think of is Stokes’ theorem. In almost every textbook, the theorem is stated for $C^{\infty}$ manifolds-with-boundary. But stated in this fashion, one can’t even apply it to a simple square with its four pointy corners. One might then try to formulate a version for manifolds-with-corners (and almost every textbook author says it can be done and moves on because they’re tired of things). But sometimes even corners are not general enough, because something as simple as a cone with its pointy vertex is not a manifold-with-corners. Do we really want to go through every type of singularity (if that’s even a possibility) make a universal definition (if at all possible) and then bring things down to reality? I don’t think so. As you can see, trying to answer questions concretely can take you very deep itself, and here you’ll inevitably be led to some parts of geometric measure theory.
In differential geometry, the notion of curvature can be studied from so many different perspectives. For example, you can start very geometrically using parallel-transport, and then you may be led to the study of holonomy groups, and then finally, you may arrive at the ‘infinitesimal version’ which is curvature. Alternatively, you could arrive at things from a very bundle theoretic point of view. A connection is a choice of horizontal complement to the vertical subbundle. The vertical subbundle is integrable (almost by definition) so one might naturally be led to the question of when the horizontal bundle is integrable. It turns out this is completely answerable using Frobenius’ theorem, and unwinding what it says essentially gives us a formula for the curvature of a connection, and tells us the horizontal subbundle is integrable if and only if the curvature vanishes. Basically, my point is that even for a single object/concept (curvature in this case) studying it from multiple perspectives gives you much greater insight into matters; definitely much more than merely saying “the Riemann curvature tensor is the unique tensor field such that___”. This is also why we study so many other curvature tensors in an attempt to understand the full Riemann tensor: Weyl tensor, Ricci tensor, Ricci scalar, sectional curvatures etc.
Back to an analysis example. Given two $n\times n$ matrices $A$ and $B$, we can define the term commute, it means obviously that $[A,B]:=AB-BA=0$. Ok, but when dealing with unbounded operators on a Hilbert space, these compositions may no longer even make sense, and the domain might end up being reduced to $\{0\}$. This is an unfortunate situation. Fortunately, in the case where $A,B$ are self-adjoint we can still salvage a definition. In the finite-dimensional case, saying they commute is equivalent to saying their resolvents commute, and this is a definition which works in the infinite-dimensional case (see Reed-Simon for more).
In fact, I can simply quote the entire subject of Spectral theory as an example. In finite-dimensions, the usual statement is that every self-adjoint square matrix $A$ (i.e conjugate transpose equal to itself) has an orthonormal basis of eigenvectors. However, in infinite-dimensions, unbounded operators need not have eigenvectors. So, such a direct generalization is out of the question; as a result, one goes back to the finite-dimensional statement and proves several equivalent statements for it, for example using the orthogonal projections, or reinterpreting diagonalization to mean finding a unitary transformation so that the result becomes multiplication by eigenvalues. Such reformulations in the finite-dimensional case do generalize to infinite dimensions, giving us the multiplication operator form of the spectral theorem, and also the projection-valued measure version, both of which are extremely important, especially in quantum mechanics.
We need to have multiple ways of thinking about the same concept/problem because then if the setup of the problem is changed slightly, we can atleast attempt to solve things differently. In the comments I gave the example of differential equations. We can’t simply say “Banach’s fixed point theorem gives us existence and uniqueness of solutions” and call it a day for ODEs, because this abstract statement tells us nothing about further properties of solutions. So, to study these, one might try eigenfunction expansions and here the spectral theory is unbelievably useful. But given a nonlinear equation, boom, 85% of the theory gets thrown out (even more so if the linearization doesn’t give a self-adjoint operator).
Continuing my previous bullet point, when coming to PDEs, we’re dead in the water because some PDEs don’t even have solutions. So, it’s not enough for us to have simply known the ODE existence&uniqueness theorem; we need to develop further techniques. Particularly, we need techniques other than simply eigenfunction expansions for studying differential equations. For example, for hyperbolic PDEs, one often uses energy-based arguments and lots of harmonic analysis while for elliptic PDEs one uses techniques from complex and harmonic analysis, and for both we have the method of a-priori estimates.

So, the bottom line is we have to be very flexible.

Some extra ramblings

Although I agree with much of Spivak’s comment, there is obviously a lot which he has left unsaid. The idea of viewing things from multiple perspectives is something which he definitely implements even if he doesn’t explicitly say so: for example, his textbook has many problems building on the material in the chapters, and some problems rely on the problems of previous chapters, and sometimes he outlines different proofs of the same problems and so on.

Next, I have to say that as much as I love abstract definitions in general and particularly when dealing with algebra, I feel like it doesn’t really always blend well with analysis and geometry. This is obviously not to say that abstraction has no role in analysis/geometry; only a fool would say that. There has to be a balance, considering pedagogy, convenience and just overall context. Here’s a quote from Spivak’s preface to Calculus on Manifolds:

… Yet the proof of this theorem is, in the mathematician's sense, an utter triviality-a straight forward computation. On the other hand, even the statement of this triviality cannot be understood without a horde of difficult definitions from Chapter 4. There are good reasons why the theorems should all be easy and the definitions hard. As the evolution of Stokes' Theorem revealed, a single simple principle can masquerade as several difficult results; the proofs of many theorems involve merely stripping away the disguise.

Of course we shouldn’t take this too literally and make everything completely abstract. However, there is a lot of truth in this statement, and although he doesn’t do his best job showcasing it in this particular text, I think his differential geometry volumes do so much better. In order to strip away the disguise, and truly understand what is left over, one has to think of things from many different perspectives (wonderfully done in his Volume II).

And a quote from Spivak’s Calculus

In addition to developing the students' intuition about the beautiful concepts of analysis, it is surely equally important to persuade them that precision and rigor are neither deterrents to intuition, nor ends in themselves, but the natural medium in which to formulate and think about mathematical questions.

While he talks about precision and rigor, we could say the same from abstraction and concreteness. There has to be a happy balance (which is obviously subjective, and one must decide for themselves) so that we can think about things freely without getting bogged down in unnecessary details, otherwise we end up missing the forest for the trees (and this is extremely likely especially in such a notorious subject as differential geometry, which as the joke goes, is the study of things invariant under change of notation).

Edit:

In response to the comment, I decided to elaborate slightly on point 1 with some differential geometry examples.

Proving that the tangent bundle of products is diffeomorphic to product of tangent bundles is pretty obvious using curves: send a curve $[(\gamma_M,\gamma_N)]$ to $([\gamma_M],[\gamma_N])$. Of course, there are some checks to be made here, but it is pretty obvious that this is correct. Granted, here the typical proof using the projections and injections is not difficult, and good to know.
With the universal definition of tangent bundles, it is hard to define maps out of/into tangent bundles, because you don’t know what their elements look like. The universal definition is only significantly more helpful/clean when you have some maps lying around to which you can apply the chain rule (functoriality). Otherwise, you’re essentially going to be composing back and forth with charts so you can reduce yourself to the case of an open set $U\subset\Bbb{R}^n$ where by definition you know its tangent bundle is $TU=U\times\Bbb{R}^n$. This gets pretty cumbersome for any practical matter. For example, let’s consider the canonical flip in the double tangent bundle. Defining it is a pain in general. But using curves, atleast the idea becomes clear. An element of $TTM$ is by definition an equivalence class $[t\mapsto \Gamma(t)]$ where each $\Gamma(t)\in TM$, so we can write this as $[t\mapsto [s\mapsto \gamma(s,t)]]$. The canonical flip then is simply exchanging the two parameters: i.e you send $[t\mapsto [s\mapsto\gamma(s,t)]]$ to $[t\mapsto[s\mapsto\gamma(t,s)]]$. It is clear that this map is an involution, and it makes the corresponding diagram commute. Just thinking abstractly, it is not even clear that the second tangent bundle possesses such a structure, but using curves it is an obvious question to ask “what happens if I consider $\gamma(t,s)$ instead of $\gamma(s,t)$”; of course there are some details to be verified, but this atleast gets us started.

Best Answer

Related Solutions

Tangent bundle projection is $C^{\infty}$ even though the manifold is only $C^{k+1}$

Why defining tangent bundle

Related Question