Motivation for vector spaces

complex numbersdefinitioneducationsoft-questionvector-spaces

For instance I can make a narrative for defining some familiar sets, leading up to complex numbers, as follows:

  1. Natural numbers are obvious. They are "ordered sets" useful for counting things, and we can define addition and multiplication on them in a useful way.
  2. Integers were defined because we wanted additive inverses, e.g. $5+x=3$. This also an "ordered set"
  3. Rationals were defined because we wanted multiplicative inverses, e.g. $3x=2$, and now we have a "field" and since they are ordered, an "ordered field."
  4. Reals were defined because we wanted to solve $x\cdot x = 2$, these are also an "ordered field."
  5. Complex numbers were defined because we wanted $x^2=-1$. We defined multiplication and addition specifically so that they satisfy the field axioms, which makes complex numbers also a "field." (We lose "ordered" because we can't define it in a consistent way, but the reals are a subfield which we can order so we're still ok.)

Ok, so now we have ordered sets and fields and ordered fields. Is there a similar narrative for defining "vector spaces?"

Most textbooks simply start with "here are fields and here are the axioms for vector spaces over fields." e.g. what is the sequence of logic that led us to invent $\mathbb{R}^N$ or $\mathbb{C}^N$, along with the addition/scaling/identity/closure axioms that go with it? Why didn't we introduce the concept of multiplication (I don't mean inner/outer/cross product) into the definition?

Best Answer

(Disclaimer: This is not at all a historical account - it is a path of intuition that leads to vector spaces, not the only path)

Vector spaces! (a.k.a. stuff you can add and scale, but not multiply)

Vector spaces are intentionally capturing the notion of quantities that can be sensibly added together or scaled, but not necessarily anything else. This situation arises naturally in various phenomenon - I'll discuss two simple examples: durations and displacements. Then, we can discuss why a weaker set of axioms, extracted only from the goal of having addition and scaling, is a useful thing - especially when this is a suitable model of natural objects we care about. Finally, we get to $\mathbb R^n$ (and its rules for addition and multiplication) which arises through the use of coordinates on (finite dimensional) vector spaces.


Example 1: The space of durations

For durations, we can clearly have some idea that it ought to be possible to take two lengths of time and sum them together - certainly expressions such as $$1\text{ minute} + 17\text{ seconds}$$ $$1\text{ hour} + 34\text{ minutes}$$ make sense, where we can just imagine placing such intervals of time end-to-end and will naturally run into such things. It also makes sense to scale a duration - I can say that an hour is sixty times as long as a minute, and generally multiplications such as $$2\cdot( 51\text{ seconds})$$ $$\frac{1}7\cdot (1 \text{ hour})$$ clearly refer to meaningful durations - where I can take a duration and a positive scaling factor and get a new duration. What doesn't make sense*, however, is to try to multiply two durations: $$(1\text{ hour})\cdot (1\text{ hour}) =\,\,???$$ $$(60\text{ minutes})\cdot (60\text{ minutes}) =\,\,???$$ Sure, you could come up with arbitrary definitions for multiplication of durations, but it wouldn't really mean anything tangible. We could get around this sort of issue if we all agreed "duration will always be represented as a real number their length in minutes," but this would be a completely arbitrary choice and wouldn't tell us anything new about durations. If we want to stick to the natural phenomenon, we're left with only a couple operations on durations: we can add and we can scale. We might envision a duration as something that lives on a timeline - just being some segment of that line that we can lay end to end to add or dilate to scale - but that's about all you can do with line segments without adding any extra information.

You might generalize a bit to add some sign information to this - maybe we want to discuss how 5:00 PM is one hour before 6:00 PM, but 7:00 PM is one hour after. Okay, fine, we can revise our model to imagine that durations are more like arrows between points on a timeline - they have a length and a direction. This is much like how one gets from natural numbers to integers.

At this point all we really know is that we can add and subtract signed durations, as well as scale them by real factors. There's our first example of a vector space, albeit a one-dimensional one.


Example 2: The space of displacements

Displacements (changes in position) are a more canonical example of the same phenomenon - and are absolutely fundamental in any pursuit of physics. For the purposes of this post, let's imagine** that Earth is flat to avoid clever people poking holes in things. A displacement is just a relationship between points - for instance, I could say "this road is 5 miles north of that road" or "that road is 5 miles south of this road" or "that satellite is 254 miles above me." I could geometrically imagine them as just arrows in space, but where I'm not interested in where the arrow starts and ends, but only in the size and direction of the arrow itself, so that "5 miles north" is a displacement regardless of where I base it.

It should be clear that we can add these things together - certainly, "move 10 feet North, then 50 kilometers downwards" is a sensible way to describe the spatial relationship of two points, and can visually be thought of laying two arrows end to end - and this is needed to say, for instance, "my net change in position of two days is the sum of the change in the first day with the change in the second day," or to, more sophisticatedly, develop calculus to deal with the motion of objects through space and say "the net change in position is the integral of velocity (rates of displacement) over time."

Similarly, it makes sense to scale these arrows - saying "it's thrice three furlongs east" is sensible as is "it's half an angstrom above." We can also negate displacements by reversing their direction and even define scaling by negative factors to be scaling by a positive, then reversing direction.

Again, multiplication does not make sense: what is "two inches east times five meters north"? In fact, it makes even less sense than with durations, because now we have the issue of direction - where exactly is east times north? How's it relate to up times up? That's all utter nonsense!

All we really know is that we can add and subtract displacements, as well as scale them by real factors. Hm, if only we had a name for stuff that we can add and scale...


Vector spaces, abstractly

A vector space is meant to model this kind of behavior where it's possible to add and scale things - and we can sort of work out what axioms we expect to hold. There's certainly plenty of work there, but once we've decided that we want to talk about addition and scaling as two operations, the vector space axioms are all fairly reasonable to assume, even if their necessity may not immediately be apparent. It's not so objectionable to ask that there be a zero vector, or that addition be associative and commutative, or that additive inverses exist, nor should statements like "scaling by $a$ then $b$ is the same as scaling by $ab$" or "scaling $x+y$ by $a$ should give $a\cdot x + a\cdot y$" or "scaling $x$ by $a+b$ should give $a\cdot x + b\cdot x$" or "scaling by $1$ shouldn't do anything" - and that's literally all that the vector space axioms say.

The trick of this sort of axiomatization is this: a lot of things satisfy these axioms, be it durations, displacements, weights, velocities, or accelerations - or even more exotic things such as waves or abstract ones such as $\mathbb R^n$ or "the space of solutions to a recurrence relation" or "the space of continuous functions $\mathbb R\rightarrow\mathbb R$." This is to say: the axioms aren't too strong to exclude having lots of examples. However, the axioms are strong enough to prove useful facts - such as bringing ideas of matrices and bases and dimension to bear on problems in any of these instances - and we know we've done something right when we find a set of axioms that gives us abstract theorems that apply to loads of situations that we cared about before we had the axioms.

We would lose the broad applicability of the theory if we started insisting upon multiplication or anything else - yeah, there are things such as algebras, which are vector spaces with a multiplication rule, or inner product spaces or Banach spaces - and sometimes you do need this extra structure - but not everything is like that, so we can get more general results by not including anything we don't need. This is a bit different from the situation with complex numbers where it sometimes doesn't hurt to have extra stuff in your space - we're defining a class of things rather than a single thing, and we actually benefit from not requiring anything we're not going to use.


Coordinates (a.k.a. why $\mathbb R^n$ is so ubiquitous)

Let me finish this discussion by bringing coordinates into play. The final piece of the puzzle is that vector spaces are defined (up to isomorphism) by their dimension - which is to say, they can all be given coordinates.

In the two explicit examples, it should be fairly clear that a signed duration is just any expression of the form $x\text{ years}$ for any real $x$. So, durations can be represented by a single real number, and adding them together adds this number, whereas scaling them multiplies by that real number. This isn't inherent to the space of durations - you can choose any non-zero duration to base your coordinate system on and then say every other is some factor times that one - but it is possible.

Similarly, any displacement can be written uniquely as $$x\text{ meters north} + y\text{ meters east} + z\text{ meters up}$$ where we have three coordinates - and the sum of $$x_1\text{ meters north} + y_1\text{ meters east} + z_1\text{ meters up}$$ $$x_2\text{ meters north} + y_2\text{ meters east} + z_2\text{ meters up}$$ just turns out to be $$(x_1+x_2)\text{ meters north} + (y_1+y_2)\text{ meters east} + (z_1+z_2)\text{ meters up}$$ and scaling by a factor $c$ would give $$(cx)\text{ meters north} + (cy)\text{ meters east} + (cz)\text{ meters up}.$$ These are facts with real geometrical significance - they give us a way to talk about displacements mathematically. We might try to abbreviate these notations by just writing a tuple of $(x,y,z)$ instead to start being able to tersely write: $$(x_1,y_1,z_1)+(x_2,y_2,z_2)=(x_1+x_2,y_1+y_2,z_1+z_2)$$ $$c\cdot (x_1,y_1,z_1)=(cx_1,cy_1,cz_1)$$ and then, oh look, we just invented the vector space $\mathbb R^3$ consisting of tuples of three real numbers, with addition and scaling defined per-component - and this turns out to be a perfectly good representation of displacements. Of course, we might immediately start to generalize about other things we care about - a velocity can be written as $$x\text{ meters/second north} + y\text{ meters/second east} + z\text{ meters/second up}$$ and it's going to have the same rules for addition and multiplication of the tuples $(x,y,z)$ as before - suggesting that $\mathbb R^3$ can be used to represent these too.

Again, these notions aren't intrinsic to the space - I could just as well say any displacement can be written as $$x'\text{ lightyears northeast} + y'\text{ nanometers west} + z'\text{ smoots up-south}$$ where a "smoot" is any unit of distance. Sure, each displacement would have to be written using a different tuple $(x',y',z')$ than it was in the previous form - and we might find this form less intuitive - but it's equally correct, and gives another way we could represent the space using $\mathbb R^3$.

This process is essentially just picking a basis for our vector space - as long as we all agree which displacement is represented by $(x,y,z)$, we can happily use that representation pretty much everywhere. Even when our end goal isn't to talk about $\mathbb R^3$ (and it hardly ever is), the fact is that a lot of spaces we do care about can easily be manipulated through coordinates, which makes the space of tuples $\mathbb R^3$ rather fundamental - and, of course, it should bring us no obstacle to realize that we can have $\mathbb R^n$ for any $n$ that we like, with similar rules for addition and multiplication.

(*Okay, okay, you could get $1\text{ hour}^2$, whatever that means, and I'm sure physicists sometimes are happy to do this sort of thing - but clearly that's not a duration - it's transformed into something else)

(**Or, if you already believe this, don't stop believing it)

Related Question