As described by Andrey Karpathy, the basic recurrent neural network cell is something like
$$ h_t = \tanh(W_{hh}h_{t-1} + W_{xh}x_t) $$
so it takes previous hidden state $h_{t-1}$ and current input $x_t$, to produce hidden state $h_t$. Notice that $W_{hh}$ and $W_{xh}$ are not indexed by time $t$, we use the same weights for each timestep. In simplified python code, the forward pass is basically a for-loop:
for t in range(timesteps):
h[t] = np.tanh(np.dot(Wxh, x[t]) + np.dot(Whh, h[t-1]))
So it doesn't matter how many timesteps there are, it is just a matter of how it is implemented. People often use fixed number of timesteps to simplify the code and work with simpler data structures.
In Keras, the RNN cells take as input tensors of shape (batch_size, timesteps, input_dim)
, but you can set them to None
if you want to use varying sizes. For example, if you use (None, None, input_dim)
, then it will accept batches of any size and any number of timesteps, with input_dim
number of features (this needs to be fixed). It is possible because this is a for-loop and we apply same function to every timestep. It would be more complicated in other cases, where varying sizes would need us to be using things like varying sizes for the vectors of parameters (say in densely-connected layer).
Best Answer
In GRU/LSTM Cell, there is no option of return_sequences. That means it is just a cell of an unfolded GRU/LSTM unit.
The argument of GRU/LSTM i.e. return_sequences, if return_sequences=True, then returns all the output state of the GRU/LSTM.
GRU/LSTM Cell computes and returns only one timestamp.
But, GRU/LSTM can return sequences of all timestamps.
In Figure 1, the unit in loop is GRU/LSTM. In Figure 2, the cells shown are GRU/LSTM Cell which is an unfolded GRU/LSTM unit.