[Math] Vector outer product for more than two vectors

linear algebravectors

I am reading a paper titled, "Temporal Collaborative Filtering with Bayesian Probabilistic Tensor Factorization" and I am thinking about the following equation. It states,

$\mathbf{R} \approx \sum \limits_{d=1}^D U_{d \ ,\ :} \circ \ V_{d \ ,\ :} \circ \ T_{d \ ,\ :}$

where $\mathbf{R} \in {\rm I\!R}^{N \times M \times K}$, $U \in {\rm I\!R}^{D \times N}, V \in {\rm I\!R}^{D \times M}$ and $T \in {\rm I\!R}^{D \times K}$ and $X_{d \ , \ :}$ indicates $d^{th}$ row of matrix $X$.

Now my question is, how do we actually calculate outer product of three vectors? For example, I took three row vectors $a,b,c$ of dimensions $N=10, M=8$ and $K=4$ respectively and tried to calculate:

$OP=a \circ b \circ c = a \circ (b^T*c) = a^T*(b^T*c)$, which is wrong since dimensions don't match for matrix multiplication. (Outer product not associative it seems)

They have also given a scalar version,

$R_{ij}^k \approx <U_i, V_j, T_k> = \sum \limits_{d=1}^D U_{di}V_{dj}T_{dk}$, where $U_i, V_j, T_k$ are all $D$-dimensional vectors.

Update

Well, I can calculate the outer product as follows:

First calculate outer product of $a$ and $b$ as $a^T*b$, which will be of size $10 \times 8$. Consider that as 10 instances of $1 \times 8$ vectors (by rearranging dimensions as $1 \times 8 \times 10$). Now multiply each of that instance with the third vector $c$. It gives me a $8 \times 4$. So finally, I get a three dimensional matrix $10 \times 8 \times 4$.

It gives the correct answer, but is this the appropriate way? It looks to me like an engineered solution. I could do it since I knew the answer.

Best Answer

In Python, NumPy provides a very useful function that typically solves this problem in one line: numpy.einsum (for Einstein summation convention). Quoting the doc:

Using the Einstein summation convention, many common multi-dimensional array operations can be represented in a simple fashion. This function provides a way compute such summations. The best way to understand this function is to try the examples below, which show how many common NumPy functions can be implemented as calls to einsum.

Tricky detail: The alphabetical order matters: np.einsum('am',A)outputs $A$, while np.einsum('ma',A) outputs $A^\top$.

In your case:

import numpy as np 

d = 10 
U = np.random.rand(d) 
V = np.random.rand(d) 
T = np.random.rand(d) 
R = np.einsum('ai,aj,ak->ijk',U,V,W)
Related Question