Linear Algebra – Origin of the Dot and Cross Product

cross productlinear algebramath-history

Most questions usually just relate to what these can be used for, that's fairly obvious to me since I've been programming 3D games/simulations for a while, but I've never really understood the inner workings of them… I could get the cross product equation as a determinant of a carefully-constructed matrix,

but what I want to ask is… How did the dot and cross product come to be? When were they "invented"? Some detailed proofs? Did someone say: "Hey, wouldn't it be nice if we could construct a way to calculate a vector that is perpendicular to two given operands?"

Basically, how/why do they work?

I would appreciate explanations, links to other explanations, other web resources… I've been searching the Internet lately for explanations, but most of them are on how to use it and nothing that really gives substance to it.

Best Answer

A little bit more of the 'how and why': the dot product comes about as a natural answer to the question: 'what functions do we have that take two vectors and produce a number?' Keep in mind that we have a natural additive function (vector or componentwise addition) that takes two vectors and produces another vector, and another natural multiplicative function (scalar multiplication) that takes a vector and a number and produces a vector. (We might also want another function that takes two vectors and produces another vector, something more multiplicative than additive — but hold that thought!) For now we'll call this function $D$, and specifically use the notation $D({\bf v},{\bf w})$ for it as a function of the two vectors ${\bf v}$ and ${\bf w}$.

So what kind of properties would we want this hypothetical function to have? Well, it seems natural to start by not distinguishing the two things it's operating on; let's make $D$ symmetric, with $D({\bf v},{\bf w})=D({\bf w},{\bf v})$. Since we have convenient addition and multiplication functions it would be nice if it 'played nice' with them. Specifically, we'd love it to respect our addition for each variable, so that $D({\bf v}_1+{\bf v}_2,{\bf w}) = D({\bf v}_1,{\bf w})+D({\bf v}_2,{\bf w})$ and $D({\bf v},{\bf w}_1+{\bf w}_2) = D({\bf v},{\bf w}_1)+D({\bf v},{\bf w}_2)$; and we'd like it to commute with scalar multiplication similarly, so that $D(a{\bf v}, {\bf w}) = aD({\bf v}, {\bf w})$ and $D({\bf v}, a{\bf w}) = aD({\bf v}, {\bf w})$ — these two conditions together are called linearity (more accurately, 'bilinearity': it's linear in each of its arguments). What's more, we may have some 'natural' basis for our vectors (for instance, 'North/East/up', at least locally), but we'd rather it weren't tied to any particular basis; $D({\bf v},{\bf w})$ shouldn't depend on what basis ${\bf v}$ and ${\bf w}$ are expressed in (it should be rotationally invariant). Furthermore, since any multiple of our function $D$ will satisfy the same equations as $D$ itself, we may as well choose a normalization of $D$. Since $D(a{\bf v},a{\bf v}) = aD({\bf v},a{\bf v}) = a^2D({\bf v},{\bf v})$ it seems that $D$ should have dimensions of (length$^2$), so let's go ahead and set $D({\bf v},{\bf v})$ equal to the squared length of ${\bf v}$, $|{\bf v}|^2$ (or equivalently, set $D({\bf v},{\bf v})$ to $1$ for any unit vector ${\bf v}$; since we chose $D$ to be basis-invariant, any unit vector is as good as any other).

But these properties are enough to define the dot product! Since $$\begin{align} |{\bf v}+{\bf w}|^2 &= D({\bf v}+{\bf w},{\bf v}+{\bf w}) \\ &= D({\bf v}+{\bf w},{\bf v})+D({\bf v}+{\bf w},{\bf w}) \\ &= D({\bf v},{\bf v})+D({\bf w},{\bf v})+D({\bf v},{\bf w})+D({\bf w},{\bf w})\\ &= D({\bf v},{\bf v})+2D({\bf v},{\bf w})+D({\bf w},{\bf w}) \\ &= |{\bf v}|^2+|{\bf w}|^2+2D({\bf v},{\bf w}) \end{align}$$ then we can simply set $D({\bf v},{\bf w}) = {1\over2} \bigl(|{\bf v}+{\bf w}|^2-|{\bf v}|^2-|{\bf w}|^2\bigr)$. A little arithmetic should convince you that this gives the usual formula for the dot product.

While the specific properties for the cross product aren't precisely the same, the core concept is: it's the only function that satisfies a fairly natural set of conditions. But there's one broad catch with the cross-product — two, actually, though they're related. One is that the fact that the cross product takes two vectors and produces a third is an artifact of $3$-dimensional space; in general the operation that the cross-product represents (orthogonality) can be formalized in $n$ dimensions either as a function from $(n-1)$ vectors to a single result or as a function from $2$ vectors that produces a 2-form, essentially a $n(n-1)/2$-dimensional object; coincidentally when $n=3$ this means that the cross-product has the 'vector$\times$vector$\rightarrow$vector' nature that we were looking for. (Note that in $2$ dimensions the natural 'orthogonality' operation is essentially a function from one vector to one vector — it takes the vector $(x,y)$ to the vector $(y,-x)$!) The other catch is lurking in the description of the cross product as a 2-form; it turns out that this isn't quite the same thing as a vector! Instead it's essentially a covector - that is, a linear function from vectors to numbers (note that if you 'curry' the dot-product function $D$ above and consider the function $D_{\bf w}$ such that $D_{\bf w}({\bf v}) = D({\bf v},{\bf w})$, then the resulting object $D_{\bf w}$ is a covector). For most purposes we can treat covectors as just vectors, but not uniformly; the most important consequence of this is one that computer graphics developers have long been familiar with: normals don't transform the same way vectors do! In other words, if we have ${\bf u} = {\bf v}\times{\bf w}$, then for a transform $Q$ it's not (necessarily) the case that the cross product of transformed vectors $(Q{\bf v})\times(Q{\bf w})$ is the transformed result $Q{\bf u}$; instead it's the result ${\bf u}$ transformed by the so-called adjoint of $Q$ (roughly, the inverse of $Q$, with a few caveats). For more background on the details of this, I'd suggest looking into exterior algebra, geometric algebra, and in general the theory of linear forms.

ADDED: Having spent some more time thinking about this over lunch, I think the most natural approach to understanding where the cross product 'comes from' is through the so-called volume form: a function $V({\bf u}, {\bf v}, {\bf w})$ from three vectors to a number that returns the (signed) volume of the rhomboid spanned by ${\bf u}$, ${\bf v}$, and ${\bf w}$. (This is also the determinant of the matrix with ${\bf u}$, ${\bf v}$, and ${\bf w}$ as its columns, but that's a whole different story...) Specifically, there are two key facts:

  1. Given a basis and given some linear function $f({\bf v})$ from vectors to numbers (remember that linear means that $f({\bf v}+{\bf w}) = f({\bf v})+f({\bf w})$ and $f(a{\bf v}) = af({\bf v})$, we can write down a vector ${\bf u}$ such that $f()$ is the same as the covector $D_{\bf u}$ (that is, we have $f({\bf v}) = D({\bf u}, {\bf v})$ for all ${\bf v}$). To see this, let the basis be $(\vec{e}_{\bf x}, \vec{e}_{\bf y}, \vec{e}_{\bf z})$; now let $u_{\bf x} = f(\vec{e}_{\bf x})$, and similarly for $u_{\bf y}$ and $u_{\bf z}$, and define ${\bf u} = (u_{\bf x},u_{\bf y},u_{\bf z})$ (in the basis we were provided). Obviously $f()$ and $D_{\bf u}$ agree on the three basis vectors, and so by linearity (remember, we explicitly said that $f$ was linear, and $D_{\bf u}$ is linear because the dot product is) they agree everywhere.
  2. The volume form $V({\bf u}, {\bf v}, {\bf w})$ is linear in all its arguments - that is, $V({\bf s}+{\bf t}, {\bf v}, {\bf w}) = V({\bf s}, {\bf v}, {\bf w})+V({\bf t}, {\bf v}, {\bf w})$. It's obvious that the form is 'basis-invariant' — it exists regardless of what particular basis is used to write its vector arguments — and fairly obvious that it satisfies the scalar-multiplication property that $V(a{\bf u}, {\bf v}, {\bf w}) = aV({\bf u}, {\bf v}, {\bf w})$ (note that this is why we had to define it as a signed volume - $a$ could be negative!). The linearity under addition is a little bit trickier to see; it's probably easiest to think of the analogous area form $A({\bf v}, {\bf w})$ in two dimensions: imagine stacking the parallelograms spanned by $({\bf u}, {\bf w})$ and $({\bf v}, {\bf w})$ on top of each other to form a sort of chevron, and then moving the triangle formed by ${\bf u}$, ${\bf v}$ and ${\bf u}+{\bf v}$ from one side of the chevron to the other to get the parallelogram $({\bf u}+{\bf v}, {\bf w})$ with the same area. The same concept works in three dimensions by stacking rhomboids, but the fact that the two 'chunks' are the same shape is trickier to see. This linearity, incidentally, explains why the form changes signs when you swap arguments (that is, why $V({\bf u}, {\bf v}, {\bf w}) = -V({\bf v}, {\bf u}, {\bf w})$) : from the definition $V({\bf u}, {\bf u}, {\bf w}) = 0$ for any ${\bf u}$ (it represents the volume of a degenerate 2-dimensional rhomboid spanned by ${\bf u}$ and ${\bf w}$), and using linearity to break down $0 = V({\bf u}+{\bf v}, {\bf u}+{\bf v}, {\bf w})$ shows that $V({\bf u}, {\bf v}, {\bf w}) + V({\bf v}, {\bf u}, {\bf w}) = 0$.

Now, the fact that the volume form $V({\bf u}, {\bf v}, {\bf w})$ is linear means that we can do the same sort of 'currying' that we talked about above and, for any two vectors ${\bf v}$ and ${\bf w}$, consider the function $C_{\bf vw}$ from vectors ${\bf u}$ to numbers defined by $C_{\bf vw}({\bf u}) = V({\bf u}, {\bf v}, {\bf w})$. Since this is a linear function (because $V$ is linear, by point 2), we know that we have some vector ${\bf c}$ such that $C_{\bf vw} = D_{\bf c}$ (by point 1). And finally, we define the cross product of the two vectors ${\bf v}$ and ${\bf w}$ as this 'vector' ${\bf c}$. This explains why the cross product is linear in both of its arguments (because the volume form $V$ was linear in all three of its arguments) and it explains why ${\bf u}\times{\bf v} = -{\bf v}\times{\bf u}$ (because $V$ changes sign on swapping two parameters). It also explains why the cross product isn't exactly a vector: instead it's really the linear function $C_{\bf vw}$ disguising itself as a vector (by the one-to-one correspondence through $D_{\bf c}$). I hope this helps explain things better!