I was doing the various Machine Learning Mastery tutorials but I got very confused. Some answers (for instance this and many others) helped me but I still am confused.
Difference between batch_size, timesteps, lags and what are the correct input dimensions?
I will provide you with an example.
I have a time series
timeSeries = np.array([[4,6,1,4,1,6,8,4,3,1,9,8,6,7,7,5]])
I want to do some predictions with it, using LSTM in Keras.
Predicting value at t
What are the batch_sizes, timesteps, epochs etc if I want to use past values to predict the one at t
?
Suppose I want to use t-2
and t-1
to predict t
. Then I can create this train datasets:
xtrain = np.array([[4,6
6,1
1,4
4,1
1,6
6,8
8,4
4,3
3,1
1,9
9,8
8,6
6,7
7,7]])
ytrain = np.array([[1,
4,
1,
6,
8,
4,
3,
1,
9,
8,
6,
7,
7,
5]])
Each column/feature in xtrain
has one lag from the column of ytrain
. This mean that the first column of xtrain
will contain the values at t-2
, while the second column of xtrain
will contain values at t-1
.
This is how I would set up the model:
model = Sequential()
model.add(LSTM(number_units, input_shape = (samples, timesteps, features))
model.add(Dense(1))
model.compile(loss= 'mse', optimizer = 'adam')
From my understanding samples would be equal to len(xtrain)
= 14. features = xtrain.shape[1]
= 2. But what would be timesteps?
The lag between the ytrain
and the second column of xtrain
is 1, and the lag between the second column of xtrain
and the first column of xtrain
is one again. So I am tempted to say that timesteps
is 1? But surely it means something else. So what does it mean?
Also, if I put 1, I would have
model = Sequential()
model.add(LSTM(number_units, input_shape = (14, 1, 2))
model.add(Dense(1))
model.compile(loss= 'mse', optimizer = 'adam')
and to fit the model, I would have
model.fit(xtrain.reshape(xtrain.shape[0], 1, xtrain.shape[1]), epochs = e, batch_size = bs))
What would be batch size in this case and what epochs? Normally an epochs is when the NN has gone through the whole xtrain
, while a batch_size
is the number of training examples after which the model updates the weights. But does it even make sense in an LSTM?
So if I set batch_size
equal to 3
for instance, what would the model actually do?
My understanding is:
it will take
[[4,6
6,1
1,4]]
feed this into the LSTM, and update the weights.
Then It would take
[[4,1
1,6
6,8]]
and update the weights, etc. After it arrives to [[6,7], [7,7]]
, it will count this as an epoch. Is this correct?
And what would change if I had put timesteps = 2?
What would have happened if I wanted to predict t
, t+1`, etc? Would this influence the timesteps?
Best Answer
I also had this question before. On a higher level, in
(samples, time steps, features)
samples
are the number of data, or say how many rows are there in your data settime step
is the number of times to feed in the model orLSTM
features
is the number of columns of each sampleFor me, I think a better example to understand it is that in
NLP
, suppose you have a sentence to process, then here sample is 1, which means 1 sentence to read,time step
is the number of words in that sentence, you feed in the sentence word by word before the model read all the words and get a whole context of that sentence,features
here is the dimension of each word, because in word embedding likeword2vec
orglove
, each word is interpreted by a vector with multiple dimensions.The
input_shape
parameter inKeras
is only(time_steps, num_features)
, more you can refer to this. That's basically how I understand this, hope make it clear for you.