Solved – How to choose between plain vanilla RNN and LSTM RNN when modelling a time series

lstmneural networksrecurrent neural networktime series

What are the criteria used to choose between plain vanilla RNN and LSTM RNN when you have to model a generic time series?

Best Answer

Empirically. The criteria is the performance on the validation set. Typically LSTM outperforms RNN, as it does a better job at avoiding the vanishing gradient problem, and can model longer dependences. Some other RNN variants sometimes outperform LSTM for some tasks, e.g. GRU.


FYI:

  • Greff, Klaus, Rupesh Kumar Srivastava, Jan Koutník, Bas R. Steunebrink, and Jürgen Schmidhuber. "LSTM: A search space odyssey." arXiv preprint arXiv:1503.04069 (2015).: "In this paper, we present the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling. The hyperparameters of all LSTM variants for each task were optimized separately using random search and their importance was assessed using the powerful fANOVA framework".
  • Zaremba, Wojciech. Ilya Sutskever. Rafal Jozefowicz "An empirical exploration of recurrent network architectures." (2015): used evolutionary computation to find optimal RNN structures.
  • Bayer, Justin, Daan Wierstra, Julian Togelius, and Jürgen Schmidhuber. "Evolving memory cell structures for sequence learning." In International Conference on Artificial Neural Networks, pp. 755-764. Springer Berlin Heidelberg, 2009.: used evolutionary computation to find optimal RNN structures.
  • Le, Quoc V., Navdeep Jaitly, and Geoffrey E. Hinton. "A simple way to initialize recurrent networks of rectified linear units." arXiv preprint arXiv:1504.00941 (2015): shows that RNNs can sometime have performances similar to LSTMs when the identity matrix is used to initialize the recurrent weight matrix.
Related Question