Solved – What’s the use of the embedding matrix in a char-rnn seq2seq model

deep learninglstmmachine learningrecurrent neural networktensorflow

Recently, I have been looking at seq2seq models that have been used for translating from one language to another using recurrent neural networks (often with LSTM cells).

Those models can also be used to generate text, one character at a time. Based on its internal memory, which efficiently encodes the previous characters, the model learns a probability distribution for the next character.

When looking at the various implementations of these seq2seq models, like this one, I see an embedding matrix is trained jointly with the neural network. As I understand it, each of this matrix's rows is the 'embedding' of a particular character (each character is represented by an integer: its id in a finite vocabulary).

What is the rationale behind using this embedding? What is it used for? Why is it needed?

LSTM: Long-Short Term Memory

Best Answer

Embeddings are dense vector representations of the characters. The rationale behind using it is to convert an arbitrary discrete id, to a continuous representation.

The main advantage is that back-propagation is possible over continuous representations while it is not over discrete representations. A second advantage is that the vector representation might contain additional information based on its location compared to the other characters.

This is still a hot area of research. If you are interested in learning more, check out the word2vec algorithms: vector embeddings are learned for words where interesting relationships are learned. For example an interesting write-up here: https://deeplearning4j.org/word2vec.html