Solved – Input vector representation vs output vector representation in word2vec

natural languageneural networksword embeddingsword2vec

In word2vec's CBOW and skip-gram models, how does choosing word vectors from $W$ (input word matrix) vs. choosing word vectors from $W'$ (output word matrix) impact the quality of the resulting word vectors?

CBOW:

enter image description here

Skip-gram:

enter image description here

Best Answer

Garten et al. {1} compared word vectors obtained by adding input word vectors with output word vectors, vs. word vectors obtained by concatenating input word vectors with output word vectors. In their experiments, concatenating yield significantly better results:

enter image description here

The video lecture {2} recommends to average input word vectors with output word vectors, but doesn't compare against concatenating input word vectors with output word vectors.


References: