Solved – Training an Elman Recurrent Neural Network

machine learningneural networks

I have few doubts related to training Elman RNN using Backpropagation Through Time Algorithm.

Assume, I present a sequence to the network and the network adapts the parameters based on the error gradient including hidden state input(context units). Now, when I present the next sequence what should be the starting hidden state input(context units)? Should the hidden state input(context units) be the ones updated by the gradient decent one or the hidden state activation obtained by last input of the previous sequence?

PS: This is a follow-up question How to train Elman RNN for Temporal XOR?

Best Answer

Since temporal context is only valid within the presentation of a specific sequence, the processing of a new sequence should have the context units reset to 0. Otherwise you'd consider the last input of the previous sequence as context which is conceptually wrong (multiple sequences aren't presented in a specific order, if they would, they would be one sequence).

From there, you do exactly what you did with the previous sequence, i.e. you feed the values forward to the hidden layer and from there fill the context layer which is used by the 2nd input of the 2nd sequence.