Solved – Input vector representation vs output vector representation in word2vec

natural languageneural networksword embeddingsword2vec

In word2vec's CBOW and skip-gram models, how does choosing word vectors from $W$ (input word matrix) vs. choosing word vectors from $W'$ (output word matrix) impact the quality of the resulting word vectors?

CBOW:

Skip-gram:

Best Answer

Garten et al. {1} compared word vectors obtained by adding input word vectors with output word vectors, vs. word vectors obtained by concatenating input word vectors with output word vectors. In their experiments, concatenating yield significantly better results:

The video lecture {2} recommends to average input word vectors with output word vectors, but doesn't compare against concatenating input word vectors with output word vectors.

References:

{1} Garten, J., Sagae, K., Ustun, V., & Dehghani, M. (2015, June). Combining Distributed Vector Representations for Words. In Proceedings of NAACL-HLT (pp. 95-101).
{2} Stanford CS224N: NLP with Deep Learning by Christopher Manning | Winter 2019 | Lecture 2 – Word Vectors and Word Senses. https://youtu.be/kEMJRjEdNzM?t=1565 (mirror)

Best Answer

Related Solutions

Solved – Why is hierarchical softmax better for infrequent words, while negative sampling is better for frequent words

Word2Vec Skip-Gram Model – Generating Output Vectors

Related Question