[Math] the output in a RNN

artificial intelligenceneural networks

I have recently been looking for some information about recurrent neural networks. Some people use a layer between the hidden state and the output and other ones use the hidden state as output.

What would be correct? and why do they use both ones?

Thank you!

RNN with output

RNN output is hidden state

Best Answer

Generally it depends what you want to do. I tend to view RNNs as doing two things at every time step $t$, conceptually: (1) updating their hidden state $h_t$ and (2) emitting some prediction or action $a_t$. Clearly, both of these can be useful, but for different reasons.

More specifically, let $x_t$ be the input at time $t$ in the current sequence. Then we might do something like: $$ h_t = g(x_t,h_{t-1}) $$ $$ a_t = f(h_t) $$

Notice that mathematically we could just have $f(h)=h$, in which case the output at time $t$ is the hidden state itself. This is obviously a simplified case of the RNN.

E.g., in reinforcement learning, $h_t$ might represent the internal representation of the state/environment at time $t$, while $a_t$ is the action performed by the agent at time $t$. Clearly $a_t$ is a useful output (it controls the agent), but $h_t$ can be passed to a planning algorithm for example or used for other tasks than just determining the next action.

Or, in translation, some approaches (like this one) feed in the input sentence to one RNN $R_1$ and take the hidden state $h_T$ as output (as it holds the network's internal representation of the full sentence, conceptually speaking), and then feed $h_T$ as the starting hidden state to another RNN $R_2$. The per-time-step outputs of $R_1$ are not important. For $R_2$, it's inputs are the last word it output; i.e., $x_t=a_{t-1}$ for $R_2$. The sequence $a=(a_1,\ldots,a_N)$ from $R_2$ is the output sentence. So for $R_1$, the "output" is $h_T$, while for $R_2$ it is $a$.

All of this is just to say that both are useful outputs and are used as such in different ways in practice.

Related Question