Solved – Time steps in Keras LSTM

keraslstmmachine learningrecurrent neural network

My understanding of time-series LSTM training is that the recurrent cell gets unrolled to a specified length (num_steps), and parameter updates are back-propagated along that length. However, once trained, an LSTM cell should be able to accept any number of time steps and produce an output.

For example, let's say I have a single-layer LSTM that accepts, at each time step, the temperatures, humidities, and wind direction vectors (2D direction) for 3 cities (4 * 3 = 12 features per time step), and predicts the temperature and humidity in a 4th city nearby (2 output features for a t+1).

Let's say for training, I set num_steps=10, and batch_size=16.

That means it will accept a vector of shape (16, 10, 12) for training, and the Keras LSTM layer will be initialised with input_shape=(10, 12). I feed it a large set of data and run a few epochs, and the LSTM cell is trained.

Once trained, I should be able to feed any any number of time steps, right? Like I could feed in 8 time steps and get an output, or 50 time steps and get an output. I should not be restricted to the 10 that I specified for the unrolling for training. My understanding is that this fixed-length unrolling is only necessary for training, and is essentially a limitation of the back-propagation algorithm.

My understanding is this is the whole point of RNNs: the input length is arbitrary; the the LSTM cell that processed the input at t is the same cell that processes the input at t-1 (the only difference is the input and the state will be different).

The reason I'm asking is because everywhere I look, it seems like num_steps becomes an intrinsic property of the trained network that cannot be changed. I must always feed in that many time steps to get an output. Moreover, by increasing num_steps, the number of parameters grows. If the number of time steps must be fixed, then I do not see the advantage of RNN/LSTM over a standard feed-forward network with num_steps*num_features features input nodes.

Do I have the wrong understanding of RNNs/LSTMs, or am I misunderstanding the Keras documentation/examples, or is this simply a limitation of Keras?

Best Answer

As described by Andrey Karpathy, the basic recurrent neural network cell is something like

$$ h_t = \tanh(W_{hh}h_{t-1} + W_{xh}x_t) $$

so it takes previous hidden state $h_{t-1}$ and current input $x_t$, to produce hidden state $h_t$. Notice that $W_{hh}$ and $W_{xh}$ are not indexed by time $t$, we use the same weights for each timestep. In simplified python code, the forward pass is basically a for-loop:

for t in range(timesteps):
    h[t] = np.tanh(np.dot(Wxh, x[t]) + np.dot(Whh, h[t-1]))

So it doesn't matter how many timesteps there are, it is just a matter of how it is implemented. People often use fixed number of timesteps to simplify the code and work with simpler data structures.

In Keras, the RNN cells take as input tensors of shape (batch_size, timesteps, input_dim), but you can set them to None if you want to use varying sizes. For example, if you use (None, None, input_dim), then it will accept batches of any size and any number of timesteps, with input_dim number of features (this needs to be fixed). It is possible because this is a for-loop and we apply same function to every timestep. It would be more complicated in other cases, where varying sizes would need us to be using things like varying sizes for the vectors of parameters (say in densely-connected layer).

Related Solutions

Solved – What happens when we feed a 2D matrix to a LSTM layer

1) X are your inputs, if you have 99 timesteps, then you have 99 vectors of size 13 each. Hence your input to each timestep is a vector that is of size 13. You will need a starting hidden state, unless you have a reason to do otherwise your beginning hidden state can be all 0's. The size of that vector is a hyperparameter you choose.

2) Keep in mind that there are not 99 LSTM cells, there is only 1 LSTM cell that is re-used 99 times for each timestep. The LSTM cell maintains a hidden state and a cell state within it that it passes forward to the next time step. But there is only 1 set of parameters being learned. Those parameters need to be able to handle all timesteps, conditional on the current input, hidden state, and cell state.

3) The cell state is not an output, however it is passed forward as an input to the next timestep. The hidden state h_t will be passed to the output as well as to the next timestep.

4) I'm not quite sure, I need a reference to the term output_dim.

This is an excellent tutorial on LSTMs: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Neural Network – Difference Between Samples, Time Steps, and Features

I found this just below the [samples, time_steps, features] you are concerned with.

X = numpy.reshape(dataX, (len(dataX), seq_length, 1))

Samples - This is the len(dataX), or the amount of data points you have.

Time steps - This is equivalent to the amount of time steps you run your recurrent neural network. If you want your network to have memory of 60 characters, this number should be 60.

Features - this is the amount of features in every time step. If you are processing pictures, this is the amount of pixels. In this case you seem to have 1 feature per time step.

Best Answer

Related Solutions

Solved – What happens when we feed a 2D matrix to a LSTM layer

Neural Network – Difference Between Samples, Time Steps, and Features

Related Question