[Math] exactly the relation between vectors, matrices, and tensors

matricestensorsvectors

I am trying to understand what Tensors are (I have no physics or pure math background, and am starting with machine learning).

In an introduction to Tensors it is said that tensors are a generalization of scalars, vectors and matrices:

Scalars are 0-order tensors, vectors are 1-order tensors, and matrices are 2-order tensors. n-order tensors are simply an n-dimensional array of numbers.

However, this does not say anything about the algebraic or geometric properties of tensors, and it seems from reading around the internet that tensors are algebraically defined, rather than defined as an array of numbers.

Similarly, vectors are defined as elements of a set that satisfy the vector space axioms. I understand this definition of a vector space. And I understand that matrices are representations of linear transformations of vectors.

Question: I am trying to understand what a Tensor is more intuitively, and what the algebraic and intuitive/geometric relation is between tensors on the one hand, and vectors/matrices on the other. (taking into account that matrices are representations of linear transformations of vectors)

Best Answer

I will try to do things as concretely as possible by working in $\mathbb{R}^n$.

As you've mentioned, matrices represent linear maps. Tensors represent multi-linear maps.

Let's start with vectors. There are broadly speaking two "kinds" of vectors. We have column vectors, which are just the usual things we call vectors, and we also have row vectors. You may not see the point of distinguishing the two, but it is actually important. To distinguish the two, I will call column vectors simply as vectors whereas row vectors will be termed covectors. To distinguish the two, I will use boldface notation like $\mathbf{v}$ for vectors, with an added tilde $\mathbf{\tilde{v}}$ for covectors. To distinguish the spaces that they live in, I will say that vectors are elements of $\mathbb{R}^n$ whereas covectors are elements of $\mathbb{R}^{n*}$.

Suppose that we have a vector $\mathbf{v}$ and a covector $\mathbf{\tilde{u}}$. Then we may multiply the two get $$\mathbf{\tilde{u}}\mathbf{v} = \begin{pmatrix}u_1 & \cdots & u_n\end{pmatrix}\begin{pmatrix}v_1 \\ \vdots \\ v_n\end{pmatrix}=u_1v_1 + \cdots + u_nv_n.$$ Notice that the result is just a scalar number (in fact, it is a dot product). Thus, $\mathbf{\tilde{u}}$ is naturally an object that takes a vector and returns a number. The object $\mathbf{\tilde{u}}$ is a linear map that maps vectors to numbers, i.e., $\mathbf{\tilde{u}}:\mathbb{R}^n \rightarrow \mathbb{R}$.

We can go one step further. In the above product, $\mathbf{\tilde{u}}$ naturally acts on $\mathbf{v}$ from the left to give a number. But we can also turn this around to say that $\mathbf{v}$ acts on $\mathbf{\tilde{u}}$ from the right to give a number. Thus we can also regard $\mathbf{v}$ as a map that takes covectors to numbers, i.e., $\mathbf{v}:\mathbb{R}^{n*}\rightarrow \mathbb{R}$. This restores the symmetry between the two.

The take away message so far is that vectors are objects that take covectors to numbers whereas covectors are objects that take vectors to numbers.

Now, what do matrices do? A matrix $A$ can be viewed in three distinct ways:

i) It takes a vector $\mathbf{v}$ to the vector $A\mathbf{v}$, i.e., $A:\mathbb{R}^n \rightarrow \mathbb{R}^n$.

ii) It takes the covector $\mathbf{\tilde{u}}$ to the covector $\mathbf{\tilde{u}}A$, i.e., $A:\mathbb{R}^{n*} \rightarrow \mathbb{R}^{n*}$

iii) Most importantly for our purposes however, is the fact that it takes a vector $\mathbf{v}$ and a covector $\mathbf{\tilde{u}}$ and gives back a number $\mathbf{\tilde{u}}A\mathbf{v}$. Thus we can view matrices as objects that take both a vector and a covector and gives back a number, i.e., $A:\mathbb{R}^{n*}\times \mathbb{R}^n \rightarrow \mathbb{R}$.

Notice that the first two properties can be seen as consequences of the last. If we give the matrix a vector, then we are still missing a covector in order to get a number. Thus when a matrix gets a vector, it becomes an object that maps covectors to numbers. These are precisely vectors as we've previously shown. Likewise, if we give the matrix a covector, then we still need a vector to get a number. Thus a matrix maps covectors to things that map vectors to numbers, i.e., covectors.

What we've shown is that the three ways of looking at matrices are equivalent. Symbolically, we can write $$A:\mathbb{R}^n \rightarrow \mathbb{R}^n \cong A:\mathbb{R}^{n*} \rightarrow \mathbb{R}^{n*} \cong A:\mathbb{R}^{n*}\times \mathbb{R}^n \rightarrow \mathbb{R}.\tag{1}$$ A simple way to remember these equivalences is to remember that we can move $\mathbb{R}^{n}$/$\mathbb{R}^{n*}$ from the left of the arrow to the right of the arrow along with adding/removing the $*$.

So far this seems like a lot of pointless linear algebraic gymnastics, but in fact we are almost there. Matrices take $\mathrm{(covector,vector)}$ pairs to numbers, but there is another very natural object that does the exact same thing, namely the $\mathrm{(vector,covector)}$ pairs. Given a $\mathrm{(vector,covector)}$ pair $(\mathbf{v},\mathbf{\tilde{u}})$, we can have it act on a $\mathrm{(covector,vector)}$ pair $(\mathbf{\tilde{x}},\mathbf{y})$ as follows: $$(\mathbf{v},\mathbf{\tilde{u}})(\mathbf{\tilde{x}},\mathbf{y}) = (\mathbf{\tilde{x}}\mathbf{v})(\mathbf{\tilde{u}}\mathbf{y}).$$ Notice that we are simply using the fact that the vector $\mathbf{v}$ takes the covector $\mathbf{\tilde{x}}$ to the number $\mathbf{\tilde{x}}\mathbf{v}$ while the covector $\mathbf{\tilde{u}}$ takes the vector $\mathbf{y}$ to the number $\mathbf{\tilde{u}}\mathbf{y}$.

The similarity between $\mathrm{(vector,covector)}$ pairs and matrices is more than skin-deep. Every matrix map can be expressed as a linear combination of these $\mathrm{(vector,covector)}$ pairs. In fact, it is rather trivial to do. For example a matrix $A$ with entries $A_{ij}$ can be written as $$A = \sum_{i,j=1}^nA_{ij}\mathbf{e}_i\mathbf{\tilde{e}}_{j},$$ so the map $A:\mathbb{R}^{n*}\times \mathbb{R}^n \rightarrow \mathbb{R}$ can be equivalently represented as $$A\cong \sum_{i,j=1}^n A_{ij}(\mathbf{e}_{i},\mathbf{\tilde{e}}_j).$$

In the present context, we typically don't use brackets $(\ ,\ )$ to denote $\mathrm{(vector,covector)}$ pairs. Instead, we use these neat little circles $\otimes$ instead: $$(\mathbf{v},\mathbf{\tilde{u}}) := \mathbf{v}\otimes \mathbf{\tilde{u}}.$$

The pairs of vector/covectors are the objects that we call tensors. They live inside the tensor space $\mathbb{R}^n \otimes \mathbb{R}^{n*}$. Specifically, these are $(1,1)$ tensors, tensors comprised of a vector and a covector. What we've also shown is that $$\mathbb{R}^{n,n} \cong \mathbb{R}^n \otimes \mathbb{R}^{n*},$$ that is, the vector space of $n\times n$ matrices $\mathbb{R}^{n,n}$ is isomorphic to the $(1,1)$ tensor space $\mathbb{R}^n \otimes \mathbb{R}^{n*}$.

The generalization is now clear. We can consider objects comprised of $r$ vectors and $s$ covectors: $$\mathbf{v}_1 \otimes \cdots \otimes \mathbf {v}_r \otimes \mathbf{\tilde{u}}_1 \otimes \cdots\otimes \mathbf{\tilde{u}}_s.\tag{2}$$ These are $(r,s)$ tensors, multi-linear maps that take $r$ covectors and $s$ vectors, returning a number. In explicit, gory detail: $$\begin{eqnarray}(\mathbf{v}_1 \otimes \cdots \otimes \mathbf{v}_r \otimes \mathbf{\tilde{u}}_1 \otimes \cdots \otimes \mathbf{\tilde{u}}_s)(\mathbf{\tilde{x}}_1,\cdots,\mathbf{\tilde{x}}_r,\mathbf{y}_1,\cdots,\mathbf{y}_s)\\ =(\mathbf{\tilde{x}}_1\mathbf{v}_1)\cdots(\mathbf{\tilde{x}}_r\mathbf{v}_r)(\mathbf{\tilde{u}}_1\mathbf{y}_1)\cdots(\mathbf{\tilde{u}}_s\mathbf{y}_s)\end{eqnarray}.$$

That's it. Tensors are nothing more than multi-linear maps $$\underbrace{\mathbb{R}^{n*} \times \cdots \times \mathbb{R}^{n*}}_{r\ \text{copies}} \times \underbrace{\mathbb{R}^n \times \cdots \times \mathbb{R}^n}_{s\ \text{copies}} \rightarrow \mathbb{R}. \tag{3}$$ You can convince yourself that any multi-linear map of this kind can be written as a linear combination of primitive tensors of the form given in $(2)$, i.e., we have the vector space isomorphism $$\underbrace{\mathbb{R}^n \otimes \cdots \otimes \mathbb{R}^n}_{r\ \text{copies}} \otimes \underbrace{\mathbb{R}^{n*} \otimes \cdots \otimes \mathbb{R}^{n*}}_{s\ \text{copies}} \cong \{\text{multi-linear maps of the form }(3)\}.$$

Of course in practice, tensors are very versatile. Using the same kind of reasoning that gave us equality $(1)$, we can move $\mathbb{R}^n$/$\mathbb{R}^{n*}$ from the left of the arrow to the right. Thus we can have identities of the form $$\begin{eqnarray} \underbrace{\mathbb{R}^{n*} \times \cdots \times \mathbb{R}^{n*}}_{r\ \text{copies}} \times \underbrace{\mathbb{R}^n \times \cdots \times \mathbb{R}^n}_{s\ \text{copies}} \rightarrow \mathbb{R} \\ \cong \underbrace{\mathbb{R}^{n*} \times \cdots \times \mathbb{R}^{n*}}_{r-k\ \text{copies}} \times \underbrace{\mathbb{R}^n \times \cdots \times \mathbb{R}^n}_{s-\ell\ \text{copies}} \rightarrow \underbrace{\mathbb{R}^n \times \cdots \times \mathbb{R}^n}_{k\ \text{copies}}\times \underbrace{\mathbb{R}^{n*} \times \cdots \times \mathbb{R}^{n*}}_{\ell\ \text{copies}}.\end{eqnarray}$$ Thus we can equivalently regard $(r,s)$ tensors as multi-linear maps that take $r-k$ covectors and $s-\ell$ vectors, in turn giving us $k$ vectors and $\ell$ covectors. This is similar to the corresponding properties that matrices/$(1,1)$-tensors had.

Hopefully you can see now that tensors are a very versatile concept which directly generalize the concept of linear mappings. I shouldn't have to convince you of the utility that such objects have. Ideally, I would close with a practical example, but this answer is already uncomfortably long, so I'll stop here.