Solved – Word2vec that can distinguish words with different meanings

natural languageword embeddingsword2vec

The word2vec is a very successful method for converting different words into a dense vector of real numbers. After learning, it comes up with a look-up table which you can use to obtain the vector related to each particular word.

We know that many words have different meanings depending on the context. For example, 'book' can be a noun meaning a written material or can be a verb meaning to reserve. It seems that word2vec cannot distinguish between different meanings of a word, that is, each word has a single entry in the look-up table. I want to know whether there is any research on trying to enhance word2vec in a way that it can distinguish a word with different meanings.

Best Answer

You're right that word2vec can't distinguish between 'palm' the tree and 'palm' the part of a hand, and related problems. More broadly, it struggles to handle polysemy and homonymy.

The typical way to address this is to learn word sense embeddings instead of word embeddings. In general, this requires assigning word tokens to sense categories to learn embeddings for each meaning ("sense") that the word may have. (Basically, you learn separate vectors for 'palm₁' and 'palm₂'.) An excellent survey of word sense embedding techniques is "From Word To Sense Embeddings: A Survey on Vector Representations of Meaning" (Camacho-Collados and Pilehvar, 2018).