Solved – WaveNet is not really a dilated convolution, is it

conv-neural-networkdeep learningneural networkstensorflow

In the recent WaveNet paper, the authors refer to their model as having stacked layers of dilated convolutions.
They also produce the following charts, explaining the difference between 'regular' convolutions and dilated convolutions.

The regular convolutions looks like
Non dilated Convolutions
This is a convolution with a filter size of 2 and a stride of 1, repeated for 4 layers.

They then show an architecture used by their model , which they refer to as dilated convolutions. It looks like this.
WaveNet Dilated Convolutions
They say that each layer has increasing dilations of (1, 2, 4, 8). But to me this looks like a regular convolution with a filter size of 2 and a stride of 2, repeated for 4 layers.

As I understand it, a dilated convolution, with a filter size of 2, stride of 1, and increasing dilations of (1, 2, 4, 8), would look like this.
Actual Dilated Convolution

In the WaveNet diagram, none of the filters skip over an available input. There are no holes. In my diagram ,each filter skips over (d – 1) available inputs. This is how dilation is supposed to work no?

So my question is, which (if any) of the following propositions are correct?

  1. I don't understand dilated and/or regular convolutions.
  2. Deepmind did not actually implement a dilated convolution, but rather a strided convolution, but misused the word dilation.
  3. Deepmind did implement a dilated convolution, but did not implement the chart correctly.

I am not fluent enough in TensorFlow code to understand what their code is doing exactly, but I did post a related question on Stack Exchange, which contains the bit of code that could answer this question.

Best Answer

From wavenet's paper:

"A dilated convolution (also called a trous, or convolution with 
holes) is a convolution where the filter is applied over an area larger 
than its length by skipping input values with a certain step. It is 
equivalent to a convolution with a larger filter derived from the 
original filter by dilating it with zeros, but is significantly more 
efficient. A dilated convolution  effectively allows the network to 
operate on a coarser scale than with a normal convolution. This is 
similar to pooling or strided  convolutions, but 
here the output has the same size as the input. As a special case, 
dilated convolution with dilation 1 yields the standard convolution. 
Fig. 3 depicts dilated causal convolutions for dilations 1, 2, 4, and 
8."

The animations shows fixed stride one and dilation factor increasing on each layer. Animated Fig. 3 from Google's wavenet blog post