LSTM Terminology – Understanding LSTM Units vs. Cells

lstmneural networksrecurrent neural networktensorflowterminology

I have been studying LSTMs for a while. I understand at a high level how everything works. However, going to implement them using Tensorflow I've noticed that BasicLSTMCell requires a number of units (i.e. num_units) parameter.

From this very thorough explanation of LSTMs, I've gathered that a single LSTM unit is one of the following

which is actually a GRU unit.

I assume that parameter num_units of the BasicLSTMCell is referring to how
many of these we want to hook up to each other in a layer.

That leaves the question – what is a "cell" in this context? Is a "cell" equivalent to a layer in a normal feed-forward neural network?

Best Answer

The terminology is unfortunately inconsistent. num_units in TensorFlow is the number of hidden states, i.e. the dimension of $h_t$ in the equations you gave.

Also, from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/api_docs/python/functions_and_classes/shard9/tf.nn.rnn_cell.RNNCell.md :

The definition of cell in this package differs from the definition used in the literature. In the literature, cell refers to an object with a single scalar output. The definition in this package refers to a horizontal array of such units.

"LSTM layer" is probably more explicit, example:

def lstm_layer(tparams, state_below, options, prefix='lstm', mask=None):
    nsteps = state_below.shape[0]
    if state_below.ndim == 3:
        n_samples = state_below.shape[1]
    else:
        n_samples = 1

    assert mask is not None
    […]

Best Answer

Related Solutions

Solved – Why is the LSTM +- 1DConvNet so ineffective at waveform analysis

Solved – Why can RNNs with LSTM units also suffer from “exploding gradients”

Related Question