[Math] Tensor mode product

tensor-productstensors

The mode-$i$ product (Tensor matrix product) definition:

Given a Tensor $\mathcal{T} \in \mathbb{R}^{L_1 \times L_2 \times \ldots \times L_N}$ and a matrix $\mathbf{U} \in \mathbb{R}^{r \times L_i}$ then $\mathcal{T} \times_i \mathbf{U} \in \mathbb{R}^{L_1 \times L_2\times\ldots\times L_{i-1}\times r \times L{i+1}\ldots\times L_N}$.


According to the above definition, I would expect that
$$\mathcal{T}\times_1\mathbf{U}_1 \times_2 \mathbf{U}_2\ldots\times_N \mathbf{U}_N$$ is a tensor of size ${I_1 \times I_2\times\ldots\times I_N}$ when $\mathbf{U}_i \in \mathbb{R}^{I_i \times L_i} ~\forall~ i$.

However, whenever I see this product defined
$$\mathcal{T}\times_1\mathbf{U}_1 \times_2 \mathbf{U}_2\ldots\times_N \mathbf{U}_N$$
matrices $\mathbf{U}_i \in \mathbb{R}^{L_i \times I_i}$.


My question is why $\mathbf{U}_i \in \mathbb{R}^{L_i \times I_i}$? Shouldn't it be $\mathbf{U}_i \in \mathbb{R}^{I_i \times L_i}$?

Best Answer

The definition on pages 14-15 of the document referenced in another answer defines this product as follows:

The $n$-mode (matrix) product of a tensor $\mathcal X \in \mathbb{R}^{I_1 \times I_2 \times \ldots \times I_N}$ with a matrix $\mathbf U \in \mathbb{R}^{J \times I_n}$ is denoted by $\mathcal{X} \times_n \mathbf U$ and is of size $I_1 \times \ldots I_{n-1} \times J \times I_{n+1} \times \ldots \times I_n$. Elementwise, we have $$ \left(\mathcal X \times_n \mathbf U \right)_{i_1\ldots i_{n-1}\ j\ i_{n+1} \ \ldots \ i_N} = \sum_{i_n=1}^{I_n}x_{i_1i_2\ldots i_N }\ u_{j i_n}. $$ Each mode-$n$ fiber [of $\mathcal X$] is multiplied by the matrix $\mathbf U$.

We will use the last line of this excerpt to make a dimensional argument about the shapes of the matrices $\mathbf U^{(n)}$ in the chain $\mathcal X \times_1 \mathbf U^{(1)} \times_2 \mathbf U^{(2)} \ldots \times_{N} \mathbf U^{(N)}$. We first clarify the meaning of a few terms. A general tensor $\mathcal X \in \mathbb{R}^{I_1 \times I_2 \times \ldots \times I_N}$ has order $N$, or equivalently has $N$ dimensions. The length of the $n$th dimension of $\mathcal X$ is $I_n$. A mode-$n$ fiber is a vector obtained by fixing all indices of $\mathcal X$ except for the $n$th dimension - if $\mathcal X$ is second order, for example, then the mode-1 fibers are the columns and the mode-2 fibers are the row.

Let's now consider an arbitrary product $\mathcal C=\mathcal X \times_n \mathbf U$ to try to understand why the result has the shape it does, and also to determine the restrictions on the shape of $\mathbf U$. The result will be a tensor of the same order as $\mathcal X$ in which the mode-$n$ fibers are multiplied (meaning standard matrix multiplication) by $\mathbf U$. Thus the length of each dimension of $\mathcal C$ will be the same as the length of the corresponding dimension of $\mathcal X$, except for the $n$th dimension, which will now have length $J$, because the mode-$n$ fibers of $\mathcal C$ are the result of multiplying the mode-$n$ fibers of $\mathcal X$ (which have $I_n$ rows) by the $J \times I_n$ matrix $\mathbf U$. Therefore, in order for the symbol $\mathcal X \times_n \mathbf U$ to make sense, we need to be able to matrix-multiply the mode-$n$ fibers of $\mathcal X$ by $\mathbf U$; i.e., the number of columns of $\mathbf U$ must be the same as the length of the $n$th dimension of $\mathcal X$.

Now let's think about how it would make sense to define something like $\mathcal X \times_n \mathbf U \times_m \mathbf V$. We need the thing on the left of the $\times_m$ symbol to be a tensor, so let's decide to first resolve $\mathcal X \times_n \mathbf U$, since we know this gives us a tensor. If $\mathcal X$ is $I_1 \times \ldots \times I_N$, then we know from the preceding that $\mathbf U$ needs to have $I_n$ columns but can have any number of rows, which we'll call $J_u$. Then $\mathcal X \times_n \mathbf U$ has dimensions $I_1 \times \ldots \times I_{n-1} \times J_u \times I_{n+1} \times \ldots \times I_N$. Now let's apply $\times_m \mathbf V$ to this. For this to make sense, the number of columns of $\mathbf V$ must agree with the length of the $m$th dimension of $\mathcal X \times_n U$, which will be $I_m$ if $n \ne m$ and $J_u$ if $n=m$.

We can apply the same reasoning to compute a chain of any length of these products, so to compute $\mathcal X \times_1 \mathbf U^{(1)} \times_2 \mathbf U^{(2)} \ldots \times_{N} \mathbf U^{(N)}$ we see by the preceding arguments that $\mathbf U^{(n)}$ needs to have $I_n$ columns and can have any number of rows; i.e., $\mathbf U^{(n)} \in \mathbb R^{J_n \times I_n}$ where $J_n$ is free.

Related Question