I wanted to learn about encoder-decoder LSTM and after some digging around I get that the first LSTM layer in an encoder-decoder-LSTM outputs its hidden state and then the next LSTM layer uses that hidden state as its initial hidden state, I get this bit but what I don't understand is that what's the difference between this and stacked LSTM in terms of code?
the code for a normal stacked LSTM like the one I'm currently using for a time series forecasting problem is:
lstm = tf.keras.models.Sequential([
tf.keras.layers.LSTM(16, return_sequences=True),
tf.keras.layers.LSTM(24, return_sequences=True),
tf.keras.layers.Dense(units=1)
])
now what I dont understand is that what do I change here to make this an encoder decoder?
some articles I read added something called a RepeatVector layer however that went be necessary for me since im returning sequences on the first layer right..?
I apologise if this is a naive question, im new to DL and LSTMs
Best Answer
Good question. Yes, to turn this into an encoder-decoder, you need to turn off return_sequences in the first LSTM. This will create a 2D output, and so you need to use a RepeatVector(sequence_length) layer after this to convert to a 3D shape, so it can be passed onto the second LSTM. The rest of your network you can leave as is.
I believe both the stacked LSTM and the encoder-decoder LSTM should both run fine for seq2seq problems where the inputs and outputs are of the same sequence length. But only the encoder-decoder LSTM will run when the input and output sequences are of different lengths. Note that combining stacked LSTMs and encoder-decoder might improve your results further.