Solved – Does the “number of unrollings” of an RNN always have to match the length of the input sequence

deep learningneural networks

An RNN can be visualized in two ways: rolled and unrolled, as in the following picture

enter image description here

I have seen in a few implementations the "number of unrollings" as a parameter or variable of the system, which made me wonder if the number of "unrollings" (in the picture above it should be $3$) is or not dependent on the length of the input sequence.

In chapter 10.1 of the book Deep Learning
(by Goodfellow et al.) it is stated

What we call unfolding is the operation that maps a circuit, as in the left side of the figure, to a computational graph with repeated pieces, as in the right side. The unfolded graph now has a size that depends on the sequence length.

Is the number of times we unroll the recurrent connection always equal or not to the length of the input sequence?

I've just come across this post, where the author asks if there is any difference between unrolling one or multiple times an RNN. The answer to that question states that the "number of unrollings" is important only during training. This seems not to be consistent with the information from the book.

Best Answer

What the book mentions and what the author of the post meant are two different things.

As the book mentions, 'unfolding' is dependent on the length of the input sequence. To understand this, suppose you want to lay down the exact computations that are happening in an RNN, in that case, you have to 'unfold' your network and the size of your 'unfolded' graph would depend on the size of the input sequence. For more information refer to this page. It says that "By unrolling we simply mean that we write out the network for the complete sequence. For example, if the sequence we care about is a sentence of 5 words, the network would be unrolled into a 5-layer neural network, one layer for each word."

In case of the post, what the author meant is that during training you need 'unrolling' because you need to store the activations/ hidden states for backpropagation. During testing, you need not store the hidden states (as you don't need to do back-propagation), so no 'unrolling' is required.

Related Question