Solved – LSTM for Time Series: lags, timesteps, epochs, batchsize

deep learninglstmmachine learningneural networkspython

I was doing the various Machine Learning Mastery tutorials but I got very confused. Some answers (for instance this and many others) helped me but I still am confused.

Difference between batch_size, timesteps, lags and what are the correct input dimensions?

I will provide you with an example.
I have a time series

timeSeries = np.array([[4,6,1,4,1,6,8,4,3,1,9,8,6,7,7,5]])

I want to do some predictions with it, using LSTM in Keras.

Predicting value at t

What are the batch_sizes, timesteps, epochs etc if I want to use past values to predict the one at t?

Suppose I want to use t-2 and t-1 to predict t. Then I can create this train datasets:

xtrain = np.array([[4,6
                    6,1
                    1,4
                    4,1
                    1,6
                    6,8
                    8,4
                    4,3
                    3,1
                    1,9
                    9,8
                    8,6
                    6,7
                    7,7]])

ytrain = np.array([[1,
                    4,
                    1,
                    6,
                    8,
                    4,
                    3,
                    1,
                    9,
                    8,
                    6,
                    7,
                    7,
                    5]])

Each column/feature in xtrain has one lag from the column of ytrain. This mean that the first column of xtrain will contain the values at t-2, while the second column of xtrain will contain values at t-1.

This is how I would set up the model:

model = Sequential()
model.add(LSTM(number_units, input_shape = (samples, timesteps, features))
model.add(Dense(1))
model.compile(loss= 'mse', optimizer = 'adam')

From my understanding samples would be equal to len(xtrain) = 14. features = xtrain.shape[1] = 2. But what would be timesteps?

The lag between the ytrain and the second column of xtrain is 1, and the lag between the second column of xtrain and the first column of xtrain is one again. So I am tempted to say that timesteps is 1? But surely it means something else. So what does it mean?

Also, if I put 1, I would have

   model = Sequential()
    model.add(LSTM(number_units, input_shape = (14, 1, 2))
    model.add(Dense(1))
    model.compile(loss= 'mse', optimizer = 'adam')

and to fit the model, I would have
model.fit(xtrain.reshape(xtrain.shape[0], 1, xtrain.shape[1]), epochs = e, batch_size = bs))
What would be batch size in this case and what epochs? Normally an epochs is when the NN has gone through the whole xtrain, while a batch_size is the number of training examples after which the model updates the weights. But does it even make sense in an LSTM?

So if I set batch_size equal to 3 for instance, what would the model actually do?

My understanding is:

it will take
[[4,6
6,1
1,4]]

feed this into the LSTM, and update the weights.
Then It would take
[[4,1
1,6
6,8]]

and update the weights, etc. After it arrives to [[6,7], [7,7]], it will count this as an epoch. Is this correct?

And what would change if I had put timesteps = 2?

What would have happened if I wanted to predict t, t+1`, etc? Would this influence the timesteps?

Best Answer

I also had this question before. On a higher level, in (samples, time steps, features)

  1. samples are the number of data, or say how many rows are there in your data set
  2. time step is the number of times to feed in the model or LSTM
  3. features is the number of columns of each sample

For me, I think a better example to understand it is that in NLP, suppose you have a sentence to process, then here sample is 1, which means 1 sentence to read, time step is the number of words in that sentence, you feed in the sentence word by word before the model read all the words and get a whole context of that sentence, features here is the dimension of each word, because in word embedding like word2vec or glove, each word is interpreted by a vector with multiple dimensions.

The input_shape parameter in Keras is only (time_steps, num_features), more you can refer to this. That's basically how I understand this, hope make it clear for you.