Solved – Intepreting Doc2Vec, Cosine Similarity between Doc Vectors and Word Vectors

machine learningnatural languageneural networksword2vec

After training a doc2vec network can you only compare word vectors with each other and doc vectors with each other? Or does it also make sense to compare word vectors with doc vectors? Well, of course assuming that the dimensionality of the doc vector is the same as for the word vector (as for instance in a neural network that sums all word and doc vectors instead of concatenating them for classification).

For instance, if I have a high cosine similarity between a document vector and a particular word vector, does this imply that the document is somehow semantically similar to the word and vice versa? Thanks!

Best Answer

There's a paper, Document Embedding With Paragraph Vectors, which does PV-DBOW doc-vector training simultaneous with skip-gram word-vector training and gets interesting results where word-vectors and doc-vectors can be meaningfully compared or even added/subtracted.

Their mode is analogous to gensim's Doc2Vec dm=0, dbow_words=1 mode. (Note that without dbow_words=1, DBOW training does not need or train per-word vectors... so any you word-vectors see in the model are just the random initializations.) Other PV-DM modes (dm=1) also create word-vectors that may be comparable to doc-vectors.

(Note, though, that the "concatenative input" mode (dm=1, dm_concat=1) never mixes candidate doc-vectors and candidate word-vectors into the same input slots. Thus they don't 'pull' against each other in the same weights/dimensions during training, and are unlikely to then be comparable as in the other modes. This dm_concat mode also results in the biggest, slowest-to-train models and may only show benefits – if ever – with gigantic training sets.)

However, there's not yet a lot of published experience about how to interpret these word-vectors and doc-vectors in the "same space", or how to make them more useful for certain purposes.