Solved – Using word embeddings / word2vec for classification of entiy

deep learningneural networksrecursive-modelword2vec

I am trying to used word2vec or word vectors for classification based on entity.

For example ,

I have to classify the following words in a sentence as :

" Google gives information about Nigeria "
Here , I want to classify Nigeria as location.

Suppose I am having good word2vec vectors for each of the word, based on some readings I came to know that, Recurrent Neural Network can be used for this. So, word2vec will capture most locations with a kind of similar word vectors.

But my questions are:

a.) Suppose a new location is there. lets say, Russia . So, do I need to assign a new word vector for this location ?

b.) If my input for training does not have grammatical sense. For example,

" Google information Nigeria " . Everything else Nigeria is associated with a non-location label. Will this condition work for find new location in non-grammatical sentences.

Please help .

Best Answer

Assuming that "adding Russia" means just adding it to your vocabulary, there is no simple meaningful way to get a corresponding word vector.

Remember that you have learned your vectors not from a list of relevant words (vocabulary) but a large corpus. word2vec is a distributional method and therefore relies on observing the context of Russia, not the word itself. You could try to fix the learned embedding and in a second run over the corpus, learn only the vector 'Russia'. However, that won't be much cheaper than learning the whole embedding from scratch.

Alternatively, try to find an embedding technique that learns a parametric mapping from words to vectors. But I am not entirely sure that exists...

Related Question