Solved – the computational complexity of a 1D convolutional layer

conv-neural-networknatural languageneural networkstime complexity

What is the complexity of a 1D convolutional layer?. I'm getting $\mathcal{O}(n \cdot k \cdot d)$, but in Attention Is All You Need, Vaswani et al. report that it is $\mathcal{O}(k \cdot n \cdot d^2 )$:

enter image description here

To me, a 1D convolution is the sum of the row-wise dot products of a filter $W \in \mathbb{R}^{k \times d}$ with a region matrix $A \in \mathbb{R}^{k \times d}$, where $k$ is the length of the filter and $d$ is the depth dimension (e.g, dimensionality of word embedding space).

That gives us:

  • $\mathcal{O}(d)$ for one dot product ($d$ multiplications + $d-1$ additions)
  • we perform in total $k$ dot products (there are $k$ rows in $W$ and $A$), which amounts to $\mathcal{O}(k \cdot d)$
  • and finally, at the layer level, we apply the filter over the input $n-k+1$ times (where $n$ is the length of the input), let' say $n$ times since $n>>k$. This gives us a final complexity of $\mathcal{O}(n \cdot k \cdot d)$.

What am I missing? Where does the extra $d$ of the authors come from?

Note: it is not clear exactly in the paper whether the authors refer to standard convolutions or dilated convolutions. But while this may affect the max path length, I don't think this has an impact on complexity.

Best Answer

I realized what may be missing is the number of filters in the layer.

Even though they don't have a letter for it in the table, the authors might be assuming implicitly that the order of magnitude of the number of filters is the same as that of the number of depth dimensions. Or even more simply, that the number of filters is equal to $d$ (in that case, the conv layer does not change the depth dimensionality).

So, in that case, the time complexity indeed amounts to $\mathcal{O}(k\cdot n \cdot d^2)$ because we're repeating the $\mathcal{O}(k\cdot n \cdot d)$ routine described in the question for each of the $d$ filters.