Neural Networks – Advantages of Stacking Multiple LSTMs in Deep Learning

classificationdeep learninglstmneural networksrecurrent neural network

What are the advantages, why would one use multiple LSTMs, stacked one side-by-side, in a deep-network? I am using a LSTM to represent a sequence of inputs as a single input. So once I have that single representation— why would I pass it through again?

I am asking this because I saw this in a natural-language generation program.

Best Answer

I think that you are referring to vertically stacked LSTM layers (assuming the horizontal axes is the time axis.

In that case the main reason for stacking LSTM is to allow for greater model complexity. In case of a simple feedforward net we stack layers to create a hierarchical feature representation of the input data to then use for some machine learning task. The same applies for stacked LSTM's.

At every time step an LSTM, besides the recurrent input. If the input is already the result from an LSTM layer (or a feedforward layer) then the current LSTM can create a more complex feature representation of the current input.

Now the difference between having a feedforward layer between the feature input and the LSTM layer and having another LSTM layer is that a feed forward layer (say a fully connected layer) does not receive feedback from its previous time step and thus can not account for certain patterns. Having an LSTM in stead (e.g. using a stacked LSTM representation) more complex input patterns can be described at every layer

Related Solutions

Solved – How to train LSTM layer of deep-network

The best place to start with LSTMs is the blog post of A. Karpathy http://karpathy.github.io/2015/05/21/rnn-effectiveness/. If you are using Torch7 (which I would strongly suggest) the source code is available at github https://github.com/karpathy/char-rnn.

I would also try to alter your model a bit. I would use a many-to-one approach so that you input words through a lookup table and add a special word at the end of each sequence, so that only when you input the "end of the sequence" sign you will read the classification output and calculate the error based on your training criterion. This way you would train directly under a supervised context.

On the other hand, a simpler approach would be to use paragraph2vec (https://radimrehurek.com/gensim/models/doc2vec.html) to extract features for your input text and then run a classifier on top of your features. Paragraph vector feature extraction is very simple and in python it would be:

class LabeledLineSentence(object):
    def __init__(self, filename):
        self.filename = filename

    def __iter__(self):
        for uid, line in enumerate(open(self.filename)):
            yield LabeledSentence(words=line.split(), labels=['TXT_%s' % uid])

sentences = LabeledLineSentence('your_text.txt')

model = Doc2Vec(alpha=0.025, min_alpha=0.025, size=50, window=5, min_count=5, dm=1, workers=8, sample=1e-5)
model.build_vocab(sentences)

for epoch in range(epochs):
    try:
        model.train(sentences)
    except (KeyboardInterrupt, SystemExit):
        break

Solved – Training LSTM a sequence one item at a time

Train it one character at a time. It shouldn't diverge unless the characters are the same and have different ideal-outputs. In that case consider using one-hot vectors instead of scalar vectors. Meaning if a, b, and c are your characters then if a is the character 1, 0, 0 is the input.

Best Answer

Related Solutions

Solved – How to train LSTM layer of deep-network

Solved – Training LSTM a sequence one item at a time

Related Question