After training word vectors with word2vec, is it better to normalize them before using them for some downstream applications? I.e what are the pros/cons of normalizing them?
Solved – Should I normalize word2vec’s word vectors before using them
natural languageword embeddingsword2vec
Best Answer
When the downstream applications only care about the direction of the word vectors (e.g. they only pay attention to the cosine similarity of two words), then normalize, and forget about length.
However, if the downstream applications are able to (or need to) consider more sensible aspects, such as word significance, or consistency in word usage (see below), then normalization might not be such a good idea.
From Levy et al., 2015 (and, actually, most of the literature on word embeddings):
Also from Wilson and Schakel, 2015:
Normalizing is equivalent to losing the notion of length. That is, once you normalize the word vectors, you forget the length (norm, module) they had right after the training phase.
However, sometimes it's worth to take into consideration the original length of the word vectors.
Schakel and Wilson, 2015 observed some interesting facts regarding the length of word vectors: