Solved – Why the sudden fascination with tensors

linear algebramachine learningmatrixreferencestensor

I've noticed lately that a lot of people are developing tensor equivalents of many methods (tensor factorization, tensor kernels, tensors for topic modeling, etc) I'm wondering, why is the world suddenly fascinated with tensors? Are there recent papers/ standard results that are particularly surprising, that brought about this? Is it computationally a lot cheaper than previously suspected?

I'm not being glib, I sincerely am interested, and if there are any pointers to papers about this, I'd love to read them.

Best Answer

This is not an answer to your question, but an extended comment on the issue that has been raised here in comments by different people, namely: are machine learning "tensors" the same thing as tensors in mathematics?

Now, according to the Cichoki 2014, Era of Big Data Processing: A New Approach via Tensor Networks and Tensor Decompositions, and Cichoki et al. 2014, Tensor Decompositions for Signal Processing Applications,

A higher-order tensor can be interpreted as a multiway array, [...]

A tensor can be thought of as a multi-index numerical array, [...]

Tensors (i.e., multi-way arrays) [...]

So called tensors in machine learning

So in machine learning / data processing a tensor appears to be simply defined as a multidimensional numerical array. An example of such a 3D tensor would be $1000$ video frames of $640\times 480$ size. A usual $n\times p$ data matrix is an example of a 2D tensor according to this definition.

This is not how tensors are defined in mathematics and physics!

A tensor can be defined as a multidimensional array obeying certain transformation laws under the change of coordinates (see Wikipedia or the first sentence in MathWorld article). A better but equivalent definition (see Wikipedia) says that a tensor on vector space $V$ is an element of $V\otimes\ldots\otimes V^*$. Note that this means that, when represented as multidimensional arrays, tensors are of size $p\times p$ or $p\times p\times p$ etc., where $p$ is the dimensionality of $V$.

All tensors well-known in physics are like that: inertia tensor in mechanics is $3\times 3$, electromagnetic tensor in special relativity is $4\times 4$, Riemann curvature tensor in general relativity is $4\times 4\times 4\times 4$. Curvature and electromagnetic tensors are actually tensor fields, which are sections of tensor bundles (see e.g. here but it gets technical), but all of that is defined over a vector space $V$.

Of course one can construct a tensor product $V\otimes W$ of an $p$-dimensional $V$ and $q$-dimensional $W$ but its elements are usually not called "tensors", as stated e.g. here on Wikipedia:

In principle, one could define a "tensor" simply to be an element of any tensor product. However, the mathematics literature usually reserves the term tensor for an element of a tensor product of a single vector space $V$ and its dual, as above.

One example of a real tensor in statistics would be a covariance matrix. It is $p\times p$ and transforms in a particular way when the coordinate system in the $p$-dimensional feature space $V$ is changed. It is a tensor. But a $n\times p$ data matrix $X$ is not.

But can we at least think of $X$ as an element of tensor product $W\otimes V$, where $W$ is $n$-dimensional and $V$ is $p$-dimensional? For concreteness, let rows in $X$ correspond to people (subjects) and columns to some measurements (features). A change of coordinates in $V$ corresponds to linear transformation of features, and this is done in statistics all the time (think of PCA). But a change of coordinates in $W$ does not seem to correspond to anything meaningful (and I urge anybody who has a counter-example to let me know in the comments). So it does not seem that there is anything gained by considering $X$ as an element of $W\otimes V$.

And indeed, the common notation is to write $X\in\mathbb R^{n\times p}$, where $R^{n\times p}$ is a set of all $n\times p$ matrices (which, by the way, are defined as rectangular arrays of numbers, without any assumed transformation properties).

My conclusion is: (a) machine learning tensors are not math/physics tensors, and (b) it is mostly not useful to see them as elements of tensor products either.

Instead, they are multidimensional generalizations of matrices. Unfortunately, there is no established mathematical term for that, so it seems that this new meaning of "tensor" is now here to stay.

Related Question