Solved – Understanding input_shape parameter in LSTM with Keras

dimensionskeraslstm

I'm trying to use the example described in the Keras documentation named "Stacked LSTM for sequence classification" (see code below) and can't figure out the input_shape parameter in the context of my data.

I have as input a matrix of sequences of 25 possible characters encoded in integers to a padded sequence of maximum length 31. As a result, my x_train has the shape (1085420, 31) meaning (n_observations, sequence_length).

from keras.models import Sequential
from keras.layers import LSTM, Dense
import numpy as np

data_dim = 16
timesteps = 8
num_classes = 10

# expected input data shape: (batch_size, timesteps, data_dim)
model = Sequential()
model.add(LSTM(32, return_sequences=True,
               input_shape=(timesteps, data_dim)))  # returns a sequence of vectors of dimension 32
model.add(LSTM(32, return_sequences=True))  # returns a sequence of vectors of dimension 32
model.add(LSTM(32))  # return a single vector of dimension 32
model.add(Dense(10, activation='softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

# Generate dummy training data
x_train = np.random.random((1000, timesteps, data_dim))
y_train = np.random.random((1000, num_classes))

# Generate dummy validation data
x_val = np.random.random((100, timesteps, data_dim))
y_val = np.random.random((100, num_classes))

model.fit(x_train, y_train,
          batch_size=64, epochs=5,
          validation_data=(x_val, y_val))

In this code x_train has the shape (1000, 8, 16), as for an array of 1000 arrays of 8 arrays of 16 elements. There I get completely lost on what is what and how my data can reach this shape.

Looking at Keras doc and various tutorials and Q&A, it seems I'm missing something obvious. Can someone give me a hint of what to look for ?

Thanks for your help !

Best Answer

LSTM shapes are tough so don't feel bad, I had to spend a couple days battling them myself:

If you will be feeding data 1 character at a time your input shape should be (31,1) since your input has 31 timesteps, 1 character each. You will need to reshape your x_train from (1085420, 31) to (1085420, 31,1) which is easily done with this command :

 x_train=x_train.reshape(x_train.shape[0],x_train.shape[1],1))
Related Question