Linear Algebra – Matrix Multiplication for n-Dimensional Arrays

linear algebramatricestensor-products

Background

Within the context of my research, I have been working with a vector-based model that treats entities of a function-like language as vectors:

  • Some of these entities are "objects", represented as simple vectors in an object space $A$.
  • Others act as linear maps $f : A \to B$ over objects, and are vectors in some space $B \otimes A$.

Such vectors can be represented as $\dim(B)$ by $\dim(A)$ matrices, such that for any vector $\mathbf{v} \in A$ and any vector $\mathbf{f} \in B \otimes A$ represented as a matrix $M_{\mathbf{f}}$, the application of the function that $\mathbf{f}$ encodes to the object $\mathbf{v}$ encodes is simply the matrix multiplication $M_{\mathbf{f}} \times \mathbf{v}$.

Trivially, the result of this computation is a vector in $B$, hence fitting the interpretation that $\mathbf{f}$ encodes a function $f: A \to B$.

Problem

We want this sort of representation to scale with the arity of the function. For example, functions of type $g: A \times B \to C$ are represented by vectors in $C \otimes B \otimes A$. Within the formalism we work with, there is a way of effectively computing the equivalent of the operation performed above to apply $f$ to some object $v \in A$.

However, it would be convenient (and more elegant) if there were some way of representing such an application with matrix-like operations. This is to say, either:

  • To allow the curried application of $\mathbf{g} \in C \otimes B \otimes A$ to some $\mathbf{v} \in A$, thereby obtaining vector $\mathbf{h} \in C \otimes B$, which would effectively be a $\dim(C)$ by $\dim(B)$ matrix $M_\mathbf{h}$ representing a function $h: B \to C$.
  • To allow the application of $\mathbf{g} \in C \otimes B \otimes A$ to some $\mathbf{v \otimes w} \in A \otimes B$ (representable as a $\dim(A)$ by $\dim(B)$ matrix $M_{\mathbf{v \otimes w}}$), thereby obtaining a result vector $\mathbf{r} \in C$.

Ideally, there exists some notion of representing rank $n$ tensors as $n$ dimensional matrices (i.e. seeing vectors as 1-dimensional matrices, regular matrices as 2-dimensional matrices, etc) which would allow us to represent the ternary function $g$ described above as some 3-dimensional array $M_{\mathbf{g}}$ such that:

  • $M_{\mathbf{g}}$ can be applied to a vector $\mathbf{v} \in A$ to obtain $M_{\mathbf{g}} \times \mathbf{v} = M_{\mathbf{h}}$, a 2-dimensional matrix which in turn can be applied to $\mathbf{w} \in B$ so that $M_{\mathbf{h}} \times \mathbf{w} = \mathbf{r}$ (curried function application).
  • $M_{\mathbf{g}}$ can be applied directly to the matrix representation of $\mathbf{v \otimes w} \in A \otimes B$ to obtain the result vector $M_{\mathbf{g}} \times M_{\mathbf{v \otimes w}} = \mathbf{r}$ (regular function application).

Questions

Is this concept of treating rank $n$ tensors as $n$-dimensional matrices (or, if this term is somehow inappropriate, $n$-dimensional arrays) discussed anywhere in the mathematics literature? I have looked around, but have been unable to find any canonical definition.

Likewise, is there anything in the mathematics literature about doing matrix multiplication with $n$-dimensional arrays, as described above?

In practice, the notion of $n$-dimensional matrices is present in several programming languages (e.g. in Matlab, and in Numpy/SciPy for Python), where they can be multiplied with $k$-dimensional matrices as well as lower dimension matrices, with the same sort of constraints as normal matrix multiplication (namely that the "length" of the "last" dimension of a matrix must match that of the "first" dimension of the matrix it is applied to, just like an $m \times n$ matrix can only be multiplied with a $n \times l$ matrix).

These sort of data structures fit the bill, but I cannot find anything in the literature that describes such structures formally. I would appreciate your help if you know of any.

I appreciate the time you took to look over this rather long problem specification. I hope I was clear in explaining what I'm looking for, and look forward to reading your answers. Thanks in advance for any help you can provide me with!

Best Answer

It sounds like you want the notion of tensor contraction, possibly after specifying for each of your vector spaces an identification with its dual. Tensor contraction generalizes two very familiar operations in linear algebra, namely taking traces and ordinary matrix multiplication. Rather than "$n$-dimensional matrix" (which to me means $n \times n$ matrix) I would say "$n$-tensor."