In this blogpost "The Unreasonable Effectiveness of Recurrent Neural Networks" the author says, that he is training "a 2-layer LSTM with 512 hidden nodes" for character prediction.
So it will look somewhat like
y1 = rnn1.step(x)
y = rnn2.step(y1)
What I dont get: What are these 512 hidden nodes in a context of an LSTM?
My first guess was, that it might be the dimension of the
matrices like "self.W_hh" in the following code:
class RNN:
# ...
def step(self, x):
# update the hidden state
self.h = np.tanh(np.dot(self.W_hh, self.h) + np.dot(self.W_xh, x))
# compute the output vector
y = np.dot(self.W_hy, self.h)
return y
So self.W_hh $\in \mathbb R^{\text{number of nodes*number of nodes}}$
(I assume, that it has to be quadratic matrices, since the output has to get reused in the next timestep, is that right?). But since the input vectors are one-hot encoded characters and also the output vectors are scores for character probability, the matrices have to have the dimension $\mathbb R^{\text{number of characters*number of characters}}$ and number of characters are 26.
So it left me with the question, what are the nodes?
Best Answer
For fully-connected layers, the number of 'nodes' is the output dimension of the weight matrix. In other words, if we have a hidden layer, where:
... then the weight matrix for hidden layer 1 will be $d_i \times d_h$. $d_h$ in this case is the number of 'nodes' of hidden layer 1, it is the output dimension.
In RNNs and LSTMs, these concepts are unchanged.
However, nuance: there is an embedding layer in the input and the output, for RNNs, in general. So, the layers are like this: