Solved – Structure of Recurrent Neural Network (LSTM, GRU)

lstmneural networks

I am trying to understand the architecture of RNNs. I have found this tutorial which has been very helpful: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Especially this image:

How does this fit into a feed-forward network? Is this image just another node in each layer?

Best Answer

A is, in fact, a full layer. The output of the layer is $h_t$, is in fact the neuron output, that can be plugged into a softmax layer (if you want a classification for the time step $t$, for instance) or anything else such as another LSTM layer if you want to go deeper. The input of this layer is what sets it apart from the regular feedforward network: it takes both the input $x_t$ and the full state of the network in the previous time step (both $h_{t-1}$ and the other variables from the LSTM cell).

Note that $h_t$ is a vector. So, if you want to make an analogy with a regular feedforward network with 1 hidden layer, then A could be thought as taking the place of all of these neurons in the hidden layer (plus the extra complexity of the recurring part).

Related Solutions

LSTM Terminology – Understanding LSTM Units vs. Cells

The terminology is unfortunately inconsistent. num_units in TensorFlow is the number of hidden states, i.e. the dimension of $h_t$ in the equations you gave.

Also, from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/api_docs/python/functions_and_classes/shard9/tf.nn.rnn_cell.RNNCell.md :

The definition of cell in this package differs from the definition used in the literature. In the literature, cell refers to an object with a single scalar output. The definition in this package refers to a horizontal array of such units.

"LSTM layer" is probably more explicit, example:

def lstm_layer(tparams, state_below, options, prefix='lstm', mask=None):
    nsteps = state_below.shape[0]
    if state_below.ndim == 3:
        n_samples = state_below.shape[1]
    else:
        n_samples = 1

    assert mask is not None
    […]

Solved – LSTM : shape of tensors

The input shape for a RNN is typically 3 dimensional:

Number of samples
Number of timesteps
Input dimensions (features)

So as you say you start with a sequence of numbers, that's basically the timesteps. To successfully train a NN you need several of those sequences, that's number of samples. The input dimensions is the number of input for each timestep. For example if you try to categorize the expected weather as 'Good' or 'Bad' based on the temperature and the wind of the last 10 hours using hourly measures then your input shape is (None, 10, 2). Where None means you can feed as much data series as you have, but each data series consist of 10 times of a pair of temperature and wind.
Having this input shape the context will inherit the same shape therefore C_t, f_t, i_t, and h_t will all be pairs of temperature and wind.
Perhaps a missing point is that you can use multiple units, then the same thing happens multiple times. More units can learn more patterns or more complex patterns.
The output of LSTM is either h_t (shape (2)) or the entire list of h with shape (10, 2). You use the latter one when you stack LSTMs. Nevertheless after the LSTM layer you always need to add a dense layer to interpret the outcome of the units and to combine them into the desired output shape. For Good/Bad classification the output shape can be (1) therefore a dense layer can be used with 1 neuron. For this example a softmax activation (of the dense layer) will make sure the result is categorized e.g. 0 or 1 that should also represent Good/Bad classification in training data.

Best Answer

Related Solutions

LSTM Terminology – Understanding LSTM Units vs. Cells

Solved – LSTM : shape of tensors

Related Question