Solved – Number of samples vs timesteps for LSTM

keraslstmpython

I'm working on time series forecast program and working with a rnn based on LSTM. If I have, for example, a price series of the last 100 days, with just one feature (the price), does it make sense to shape the series as (100,1,1) if I'm working with an LSTM? I've read this would be equal to flattening the data and passing it through a Dense layer. I've tried using a TimeseriesGenerator but it's overcomplicating things for me. How should I reshape the sets if I want to add a sliding window?

This is my code:

train_x_scaled = np.reshape(train_x_scaled, (train_x_scaled.shape[0], 1, train_x_scaled.shape[1]))
test_x_scaled  = np.reshape(test_x_scaled,  (test_x_scaled.shape[0], 1, test_x_scaled.shape[1]))

model = Sequential()
model.add(LSTM(units = 16, input_shape=(train_x_scaled.shape[1], train_x_scaled.shape[2]), activation='tanh'))
model.add(Dense(units = 1, activation = 'linear'))
model.compile(optimizer = 'adam', loss = 'mean_squared_error')
early_stopping_monitor = EarlyStopping(monitor='val_loss',patience=10,verbose=1)
history = model.fit(train_x_scaled, train_y_scaled, validation_split=0.2, epochs = 300,callbacks=[early_stopping_monitor],shuffle=False,verbose=1)  

Best Answer

RNN architectures are good at remembering previous time-steps along a sequence, because of the loops nature, allows information to persist. That's why if your data has temporal dependency it is a good approach to use them rather than using only dense layers which does not address these issues.

A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor. Consider what happens if we unroll the loop [1].

If I understood correctly your dataset consist on a scalar number for each day for the last 100 days, right? I suggest then that your input data should has a shape of:

[Batch_size, sequence_length, features]

Where in your case features will be 1 and sequence_length is a parameter that will allow you to process the temporal dependence. On your specific problem would be: how much do you want to remember for your forecast prediction? 5 days? 100 days? This eventually is a hyperparameter that you will have to find.

[1] http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Related Question