In word2vec's CBOW and skip-gram models, how does choosing word vectors from $W$ (input word matrix) vs. choosing word vectors from $W'$ (output word matrix) impact the quality of the resulting word vectors?
CBOW:
Skip-gram:
natural languageneural networksword embeddingsword2vec
Best Answer
Garten et al. {1} compared word vectors obtained by adding input word vectors with output word vectors, vs. word vectors obtained by concatenating input word vectors with output word vectors. In their experiments, concatenating yield significantly better results:
The video lecture {2} recommends to average input word vectors with output word vectors, but doesn't compare against concatenating input word vectors with output word vectors.
References: