I recently read a lot about neural networks using GRU or LSTM units. There are many easy to use frameworks like tensorflow that do not even require high knowledge about programming. Unfortunately, I never really found good information on how the training of those networks work. Simple backpropagation might probably not work for gated recurrent networks or is just too inefficient for networks with such a high number of variables to learn.
So my question is:
What are the state-of-the-art algorithms used for the initialization and training of neural networks with GRU or LSTM units? I am not looking for frameworks to use, but for initialization and update equations for the internal parameters.
Best Answer
This article is a good place to start.
"Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network" by Alex Sherstinsky
This is a dense document with all of the equations your heart might desire. It would be difficult to reproduce all of the relevant materials here.
Another presentation can be found in "A Gentle Tutorial of Recurrent Neural Network with Error Backpropagation" by Gang Chen.
Also, if you're unfamiliar with backpropagation, we have a number of threads on the topic.
Regarding GRUs, I'm not aware of a similar paper. The promise of GRUs was supposedly that GRUs would provide comparable performance to LSTMs with a lower parameter count and fewer computations; results are mixed. For a comparison of LSTMs and GRUs, see Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling."