MATLAB: Import pre-trained word embeddings (GloVe, Skipgram, etc.) in Deep Neural Network models.

deep neural networksdocument classific...Text Analytics Toolboxword embeddings

I was going through this page to learn how to classify text using word embeddings and LSTM. The page talks about training the word embeddings within the LSTM architecture, but does not discuss if I want to import word embedding models trained externally such as those using Global Vectors and word2vec which already provide large-scale pre-trained word embeddings. Any ideas how I can use pre-trained word embeddings in the LSTM architecture?

Best Answer

You can use a pre-trained embedding model to initialize the Weights property of the wordEmbeddingLayer. For example:
% Import your pretrained word embedding model of choice
emb = readWordEmbedding('existingEmbeddingModel.vec');
embDim = emb.Dimension;
numWords = numel(emb.Vocabulary);
% Initialize the word embedding layer
embLayer = wordEmbeddingLayer(embDim, numWords);
embLayer.Weights = word2vec(emb, emb.Vocabulary)';
% If you want to keep the original weights "frozen", uncomment the following line
% embLayer.WeightLearnRateFactor = 0
The wordEmbeddingLayer with initialized Weights can then be placed in the network before lstmLayer.
Also note that training documents should be mapped according to the vocabulary of the pre-trained embedding model, before passing to the net for training, for example:
enc = wordEncoding(tokenizedDocument(emb.Vocabulary,'TokenizeMethod','none'));
XTrain = doc2sequence(enc,documentsTrain,'Length',75);