Solved – Time steps in Keras LSTM

keraslstmmachine learningrecurrent neural network

My understanding of time-series LSTM training is that the recurrent cell gets unrolled to a specified length (num_steps), and parameter updates are back-propagated along that length. However, once trained, an LSTM cell should be able to accept any number of time steps and produce an output.

For example, let's say I have a single-layer LSTM that accepts, at each time step, the temperatures, humidities, and wind direction vectors (2D direction) for 3 cities (4 * 3 = 12 features per time step), and predicts the temperature and humidity in a 4th city nearby (2 output features for a t+1).

Let's say for training, I set num_steps=10, and batch_size=16.

That means it will accept a vector of shape (16, 10, 12) for training, and the Keras LSTM layer will be initialised with input_shape=(10, 12). I feed it a large set of data and run a few epochs, and the LSTM cell is trained.

Once trained, I should be able to feed any any number of time steps, right? Like I could feed in 8 time steps and get an output, or 50 time steps and get an output. I should not be restricted to the 10 that I specified for the unrolling for training. My understanding is that this fixed-length unrolling is only necessary for training, and is essentially a limitation of the back-propagation algorithm.

My understanding is this is the whole point of RNNs: the input length is arbitrary; the the LSTM cell that processed the input at t is the same cell that processes the input at t-1 (the only difference is the input and the state will be different).

The reason I'm asking is because everywhere I look, it seems like num_steps becomes an intrinsic property of the trained network that cannot be changed. I must always feed in that many time steps to get an output. Moreover, by increasing num_steps, the number of parameters grows. If the number of time steps must be fixed, then I do not see the advantage of RNN/LSTM over a standard feed-forward network with num_steps*num_features features input nodes.

Do I have the wrong understanding of RNNs/LSTMs, or am I misunderstanding the Keras documentation/examples, or is this simply a limitation of Keras?

Best Answer

As described by Andrey Karpathy, the basic recurrent neural network cell is something like

$$ h_t = \tanh(W_{hh}h_{t-1} + W_{xh}x_t) $$

so it takes previous hidden state $h_{t-1}$ and current input $x_t$, to produce hidden state $h_t$. Notice that $W_{hh}$ and $W_{xh}$ are not indexed by time $t$, we use the same weights for each timestep. In simplified python code, the forward pass is basically a for-loop:

for t in range(timesteps):
    h[t] = np.tanh(np.dot(Wxh, x[t]) + np.dot(Whh, h[t-1]))

So it doesn't matter how many timesteps there are, it is just a matter of how it is implemented. People often use fixed number of timesteps to simplify the code and work with simpler data structures.

In Keras, the RNN cells take as input tensors of shape (batch_size, timesteps, input_dim), but you can set them to None if you want to use varying sizes. For example, if you use (None, None, input_dim), then it will accept batches of any size and any number of timesteps, with input_dim number of features (this needs to be fixed). It is possible because this is a for-loop and we apply same function to every timestep. It would be more complicated in other cases, where varying sizes would need us to be using things like varying sizes for the vectors of parameters (say in densely-connected layer).

Related Question