Solved – Selecting number of time lags for input in LSTM networks

lstmrecurrent neural networktime series

I know from theory that LSTM are meant to selectively capture long and short term dependencies in a sequence. I'm trying to implement LSTMs for a time series task and I notice that a lot of tutorials on the web make use of the target sequence lagged by 1 as an input, without including further back observations (at each time step).

What I don't get is whether in this way the distinctive properties of LSTMs as described above are exploited properly. Can LSTM capture long-term dependencies if only fed with the last observation in time? Is this automatically done through the internal state of the LSTM or do I need to feed the network with a window of past time lags into which I want it to capture long/short-term dependencies?

Best Answer

As you have mentioned, RNN's and LSTM's are meant to capture time dependency in time-series data. Thus, feeding in an input with only one time-step does not make sense. (Unless one is using Stateful LSTM which has a different story).

Here is an example: We have a product and we want to forecast its sales from historical data. We can then choose number of time steps based on which we want to make a prediction, for instance, given 7 days of sales, predict the sales of the 8th days. Thus, the input would be of shape (N, ts, 1) and output would be of shape (N, 1). N is the total number of samples you have (so each sample has 7 days of sales as input and the sales of the next day as output).

I am not sure which tutorials you are referring to, but this one might have a better example.

Using ANNs

If the inputs are all of the same length, you don't need a RNN. You can just simply train a ANN with with fixed input length. That would be easier and faster.

If you fit the ANN with (many) samples weight, height, past performance at time $t_i$, and then later at time $t_{i+1}$ the data of a given horse $h_k$ changes, you can just run the new data through the ANN and get a prediction. This is possible because the structure of the data is still the same, i.e. weight, height, etc.

Using RNNS

An RNN assumes a time series as input and the data of a horse is not a time series, e.g. there is no temporal relation between the weight and past performance. There could however be a causal relation. You could however train it on the result list as it is a time series. The input could then e.g. be the past $n$ performances of a given horse $h_i$ and you want to know how it may perform in the next race:

$\text{input} = \{t_0, t_1, t_2, t_3, ..., t_{n-1}, t_n\}$

$\text{output} = \{t_{n+1}\}$

Getting multidimensional output

So for your example

INPUT(race.length, race.condition, ..., horse1, horse2, horse3) 
OUTPUT(horse1.time, horse2.time, horse3.time)

i don't think you'd have to change much. The ANN now has 3 output nodes and you train it similarly to the one from above. If you have a set of input and output pairs you can just use those for training.

Combining ANN and RNN

You could also combine both methods. First you'd train the ANN on the horse data and the RNN on the performance data.

Then you could e.g. add the output of your RNN (prediction of the next performance based on the past performances) as input of the ANN together with the current data of the horse.

Solved – LSTM for Time Series: lags, timesteps, epochs, batchsize

I also had this question before. On a higher level, in (samples, time steps, features)

samples are the number of data, or say how many rows are there in your data set
time step is the number of times to feed in the model or LSTM
features is the number of columns of each sample

For me, I think a better example to understand it is that in NLP, suppose you have a sentence to process, then here sample is 1, which means 1 sentence to read, time step is the number of words in that sentence, you feed in the sentence word by word before the model read all the words and get a whole context of that sentence, features here is the dimension of each word, because in word embedding like word2vec or glove, each word is interpreted by a vector with multiple dimensions.

The input_shape parameter in Keras is only (time_steps, num_features), more you can refer to this. That's basically how I understand this, hope make it clear for you.

Best Answer

Related Solutions

Solved – LSTM for predicting probabilities

Using ANNs

Using RNNS

Getting multidimensional output

Combining ANN and RNN

Solved – LSTM for Time Series: lags, timesteps, epochs, batchsize

Related Question