Solved – Examples of “one to many” for RNN/LSTM

lstmmachine learningneural networksnonlinear regressionrecurrent neural network

Are there any examples dealing with "one to many" kind of LSTM?

Basically I am trying to build a model which takes an input vector $a$ and gives an output of $[b_1; b_2 ;b_3; b_4, \ldots; b_n]$ where $b_i$ is a vector. The vectors $a$ and $b_1, b_2, b_3$ … etc have different sizes.

I can't seem to find any examples in literature to begin understanding how to format the input and output, or even how to work around training and testing part. Can RNN even deal with different input and output size in the first place?

Another doubt I have is that lot of blogs on RNN state that they are difficult to train due to their complexity. Why is it so?

Best Answer

The most popular example is the decoder part of the seq2seq recurrent neural network (RNN). Such networks are one of the most basic examples of networks that can be used for machine translation. They consist of two sub-networks: encoder RNN network that takes as input sentence in one language and encodes using some vector representation for the whole sentence, and decoder network that uses the vector representation of a sentence to produce a sentence in target language.

You can find many examples and tutorials on such networks online, e.g. here (above image was taken from this blog), here, here, or here. Moreover, Keras code example can be found on StackOverflow.

Related Solutions

Solved – Sudden accuracy drop when training LSTM or GRU in Keras

Here are my suggestion to pinpoint the issue:

1) Look at training learning curve: How is the learning curve on train set? Does it learn the training set? If not, first work on that to make sure you can over fit on the training set.

2) Check your data to make sure there is no NaN in it (training, validation, test)

3) Check the gradients and the weights to make sure there is no NaN.

4) Decrease the learning rate as you train to make sure it's not because of a sudden big update that stuck in a sharp minima.

5) To make sure everything's right, check the predictions of your network so that your network is not making some constant, or repetitive predictions.

6) Check if your data in your batch is balanced with respect to all classes.

7) normalize your data to be zero mean unit variance. Initialize the weights likewise. It will assist the training.

Solved – Time steps in Keras LSTM

As described by Andrey Karpathy, the basic recurrent neural network cell is something like

$$ h_t = \tanh(W_{hh}h_{t-1} + W_{xh}x_t) $$

so it takes previous hidden state $h_{t-1}$ and current input $x_t$, to produce hidden state $h_t$. Notice that $W_{hh}$ and $W_{xh}$ are not indexed by time $t$, we use the same weights for each timestep. In simplified python code, the forward pass is basically a for-loop:

for t in range(timesteps):
    h[t] = np.tanh(np.dot(Wxh, x[t]) + np.dot(Whh, h[t-1]))

So it doesn't matter how many timesteps there are, it is just a matter of how it is implemented. People often use fixed number of timesteps to simplify the code and work with simpler data structures.

In Keras, the RNN cells take as input tensors of shape (batch_size, timesteps, input_dim), but you can set them to None if you want to use varying sizes. For example, if you use (None, None, input_dim), then it will accept batches of any size and any number of timesteps, with input_dim number of features (this needs to be fixed). It is possible because this is a for-loop and we apply same function to every timestep. It would be more complicated in other cases, where varying sizes would need us to be using things like varying sizes for the vectors of parameters (say in densely-connected layer).

Best Answer

Related Solutions

Solved – Sudden accuracy drop when training LSTM or GRU in Keras

Solved – Time steps in Keras LSTM

Related Question