When to *say* “tensor” instead of “matrix”

definitionmatricestensorsterminology

The difference between a tensor and a matrix is a subtle but important and well discussed point.

In the context of a vector space $V$ over a field $F$ (and its dual $V^*$), briefly put, a $(p, q)$-tensor $T$ is a multilinear map $\ {V^*}^p \times {V}^q \to F\ $.

Any basis we choose for $V$ and $V^*$ can be inherited by $T$, and a matrix $M$ is a representation of $T$ in that particular basis. Had we chosen a different basis, the matrix representation of $T$ would be some different $M' \neq M$, while $T$ is just $T$. As a mundane example on $\mathbb{R}^2$ for a $(1, 0)$-tensor, the idea is just that $\ \begin{bmatrix}2 \\ 3\end{bmatrix} \neq \begin{bmatrix}1 \\ -2\end{bmatrix}\ $ while $\ 2e_1 + 3e_2 = e'_1 – 2e'_2\ $. One consistent basis must be used to write a matrix equation, while a tensor equation (like the latter, usually shown in Einstein notation) shows equality of tensors, which holds true regardless of basis.

In practice however, I find literature commonly saying "matrix" where "tensor" would be more appropriate, making me doubt my understanding. Take for example the "covariance matrix" in probability theory. Covariance is conceptualized as the ellipse in the following geometric depiction:

enter image description here

This geometric object is invariant to our choice of basis (i.e., the ellipse exists regardless of how you draw your grid-lines), and thus is best modeled with a tensor. Everything I've ever seen done with a covariance "matrix" has been tensorial, from the $R^{-1} T R$ style transformations used in principal component analysis (typical of $(1, 1)$-tensors) to the contraction to $\mathbb{R}$ used in the exponent of the multivariate Gaussian distribution (where the covariance "matrix" is literally used as a bilinear map $V^* \times V \to \mathbb{R}$).

Meanwhile, an extremely similar object, the "inertia tensor" from rigid-body mechanics, which describes the spread of mass (instead of probability), is so purposefully called a tensor that many mechanics books include a section explaining the very difference I briefed above. Surely there is a naming inconsistency here?

Then, probably due to the popular software "TensorFlow", I see the matrices used in neural networks being called "tensors" by the machine learning community, even when there is no discernible basis-independent abstract object being described by these matrices. They're just linear mappings, and in most cases aren't even endomorphisms.

I think many people trying to understand tensors can benefit from clarifying these semantics. When should we really say "tensor"? Is it pretentious to say "covariance tensor"? Do any theories utilize a square matrix that isn't just an order-2 tensor in a particular basis? (Perhaps the Jacobian matrix?) Is there any kind of indicative mathematical expression or action that makes you go "yup, that matrix is just representing a tensor – the important object here is the tensor itself."?

Best Answer

A few unorganized thoughts below. If this answer is not really appropriate, let me know and I will remove it.

saying "matrix" where "tensor" would be more appropriate, making me doubt my understanding

I think your understanding of the distinction is fine. One issue with math terminology/notation (that is characteristic of natural language) is that predominantly used terms tend to "stick," even if there are other more appropriate or less confusing choices that just didn't catch on. Another issue is that usage is also field-specific, and, in this case, if people in those fields did not focus much on basis-invariance, it may not have been worth dwelling on it and using, say, "covariance tensor" or stressing that it is a bilinear form, instead of saying "covariance matrix." In fields outside of mathematics, there is also the issue of alienating people by using terms different from the common usage; if thinking of linear transformations, bilinear forms, tensors etc. as matrices was sufficient for them, it may be too much overhead to try to upend the common usage in that area. Throw in cases where other fields like machine learning co-opt terms like "tensor" or give new names to existing concepts and you get a great mess of multiple terms referring to the same thing, or single terms referring to different things. But this is just how language works, and it is something we just have to deal with. I think the best you can do is to make sure you understand these distinctions (which you obviously do) and try to notice how different areas use (or "misuse") terms and become more "fluent" in usage. You can of course educate others to think about these objects in the way you do, but I think pushing entire fields to adopt more-consistent terminology is not worth one's effort.