Classification – How to Feed an LSTM Network with a Mini Batch and When to Reset LSTM State

My classification problem is the following: I have a sequence of features. These are used to predict one of 200 classes. I'm trying to use RNNs(more specific LSTMs).

In each learning iteration my Framework processes a mini-batch (B feature-sequences with the length N). There, each feature from the sequence is fed into the network, resulting in N loop iterations with the B features fed at each iteration.

The actual question is about the basic learning process in LSTMs, so when should I reset the state of the LSTMs? Do I have to do it at every iteration, so for each mini-batch? Or is the reset done once before the actual training?

My first thoughts about this are the following: if I reset the state at each learning iteration, then the LSTM calculates the new state based on the B feature-sequences, which are not necessary from the same class. Would it be better for the training to have samples (feature-sequences) from the same class in one mini batch?

EDIT: After some discussion with my colleagues and some investigation of the framework I am using (chainer), I have found out some things. First, as you said, the state should be reset every minibatch. Other frameworks, like tensorflow, do this reset automatically, before each pass of the net. The second part of my question was actually about, whether the state in a LSTM is shared across all samples in the minibatch. The answer is NO. In chainer, the LSTM saves the B(for each sequence one) states. Finally, after rethinking my question, the answers are actually very obvious, but when you are new to a certain thing, everything is unclear.

Classification – How to Feed an LSTM Network with a Mini Batch and When to Reset LSTM State

Best Answer

Related Question

Best Answer

Related Solutions

Solved – Meaning of batch sizes for RNNs/LSTMs and reasons for padding

Solved – Understanding how to batch and feed data into a stateful LSTM

Related Question