Solved – LSTM output dimensionality

keraslstmmachine learningrecurrent neural networktime series

I am new to LSTMs. When reading the papers and websites about LSTM architecture, there is something I do not get.

As I understand it, a single LSTM layer can have multiple LSTM cells (just like a regular dense layer). However, what would the shape the output of that layer look like? Lets assume we are not returning sequences and only returning the final timestep value. With a single LSTM cell, the output of each cell is a vector, as opposed to a single Dense layer cell, where the output of each cell is a scalar number. With multiple LSTM cells in one layer, what would you do to the vectors to turn N vectors (with N LSTM cells in that layer) into a single vector to push into the next layer? Thank you, and sorry for my inexperience.

Best Answer

I think you've confused the dimensionality of an LSTM (the number of "units" it has and the sequence length.

As I understand it, a single LSTM layer can have multiple LSTM cells (just like a regular dense layer)

An LSTM "cell" is just what library implementors use to describe an object which computes the LSTM update for a single time-step. As such, a regular dense layer has no such concept of "cell".

However, what would the shape the output of that layer look like?

Usually something like (batch size, sequence length, dims) if you want a sequence, and you can manually index the sequence length dimension to extract the last output.

So the "cell" count is equal to the time step count that it is on, while the unit count is the amount of LSTMs per layer?

No, not quite. There is just one cell, which is invoked $T$ times to process a sequence of length $T$.

The output of an LSTM unit is a vector, which can easily be taken as an input into a following dense layer. However, if our LSTM layer consists of N LSTM units, each producing a vector as output, assuming we are not returning sequences but just the final time step output, how do we combine those N output vectors into just one vector to push into the following dense layer? Thank you.

An LSTM almost always has $N > 1$ units. An LSTM with one unit would have a scalar output. It's usually more helpful to think of an LSTM as having a state of a certain dimension $d$, rather than to think of it as having $d$ "units" inside.

Related Question