Neural Networks – Difference Between Feedback RNN and LSTM/GRU

grulstmneural networksrecurrent neural network

I am trying to understand different Recurrent Neural Network (RNN) architectures to be applied to time series data and I am getting a bit confused with the different names that are frequently used when describing RNNs. Is the structure of Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) essentially an RNN with a feedback loop?

Best Answer

All RNNs have feedback loops in the recurrent layer. This lets them maintain information in 'memory' over time. But, it can be difficult to train standard RNNs to solve problems that require learning long-term temporal dependencies. This is because the gradient of the loss function decays exponentially with time (called the vanishing gradient problem). LSTM networks are a type of RNN that uses special units in addition to standard units. LSTM units include a 'memory cell' that can maintain information in memory for long periods of time. A set of gates is used to control when information enters the memory, when it's output, and when it's forgotten. This architecture lets them learn longer-term dependencies. GRUs are similar to LSTMs, but use a simplified structure. They also use a set of gates to control the flow of information, but they don't use separate memory cells, and they use fewer gates.

This paper gives a good overview:

Chung et al. (2014). Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling.