Solved – Best way to mini-batch similar lengths sequences in a corpus for RNN training

deep learninglstmneural networksrecurrent neural network

In order to train an LSTM/RNN in batches, the sequences in the batch need to be the same length. From my understanding, this can be done by either truncating longer sequences, padding shorter ones, or some combination of the two.

It seems advantageous to batch sequences with similar original lengths, so that there is minimal need for padding/truncation. Two questions:

1) Is this true? Should I group similar-length sequences into the same batch for training? It seems it would be more efficient, but also it seems it may introduce some biases since similar-length sequences may share other similarities (for example, in classification, shorter sequences may be more likely to be a certain class).

2) What is the best way to select, from a corpus of N sequences of varying lengths, a subset of sequences which are "similar" in size? I was thinking of ordering the sequences by lengths, picking one randomly, and sampling its neighbors somehow.

Best Answer

Is this true? Should I group similar-length sequences into the same batch for training? It seems it would be more efficient, but also it seems it may introduce some biases since similar-length sequences may share other similarities (for example, in classification, shorter sequences may be more likely to be a certain class).

This is true. Grouping similar-length sequences into the same batch will speed up training (as well as testing), while introducing some bias.

What is the best way to select, from a corpus of N sequences of varying lengths, a subset of sequences which are "similar" in size? I was thinking of ordering the sequences by lengths, picking one randomly, and sampling its neighbors somehow.

That sounds like a good trade-off between shuffling batches and speed. I'm not aware of any standard solution to this trade-off: from what I have read so far, it's an empirical decision.

FYI:

Best Answer

Related Solutions

Solved – Why does the loss/accuracy fluctuate during the training? (Keras, LSTM)

Related Question