What’s the difference between stacked LSTM and encoder-decoder LSTM

keraslstmneural networkspythontensorflow

I wanted to learn about encoder-decoder LSTM and after some digging around I get that the first LSTM layer in an encoder-decoder-LSTM outputs its hidden state and then the next LSTM layer uses that hidden state as its initial hidden state, I get this bit but what I don't understand is that what's the difference between this and stacked LSTM in terms of code?

the code for a normal stacked LSTM like the one I'm currently using for a time series forecasting problem is:

lstm = tf.keras.models.Sequential([
    tf.keras.layers.LSTM(16, return_sequences=True),
    tf.keras.layers.LSTM(24, return_sequences=True),
    tf.keras.layers.Dense(units=1)
])

now what I dont understand is that what do I change here to make this an encoder decoder?

some articles I read added something called a RepeatVector layer however that went be necessary for me since im returning sequences on the first layer right..?

I apologise if this is a naive question, im new to DL and LSTMs

Best Answer

Good question. Yes, to turn this into an encoder-decoder, you need to turn off return_sequences in the first LSTM. This will create a 2D output, and so you need to use a RepeatVector(sequence_length) layer after this to convert to a 3D shape, so it can be passed onto the second LSTM. The rest of your network you can leave as is.

    lstm = tf.keras.models.Sequential([
        tf.keras.layers.LSTM(16),
        tf.keras.layers.RepeatVector(sequence_length),
        tf.keras.layers.LSTM(24, return_sequences=True),
        tf.keras.layers.Dense(units=1)
    ])

I believe both the stacked LSTM and the encoder-decoder LSTM should both run fine for seq2seq problems where the inputs and outputs are of the same sequence length. But only the encoder-decoder LSTM will run when the input and output sequences are of different lengths. Note that combining stacked LSTMs and encoder-decoder might improve your results further.

Related Question