Solved – What does average of word2vec vector mean

machine learningsvmword embeddingsword2vec

The paper that I am reading says,

tweet is represented by the average of the word embedding vectors of
the words that compose the tweet.

Does this mean each word in the tweet (sentence) has to be represented as the average of its word vector (still having the same length),

or does it mean the sentence itself has to be the average of all values computed above (average of word vectors of the word that the sentence contains)?

I am confused.

Best Answer

You can think of it in terms of physical analogy. You can take a flat surface, like a table, and arrange 30 balls on it. Then you can cut legs from the table and replace it with a single leg. In order to figure out where to put this leg you need to find center of mass of all 30 balls on the table. Assuming that each ball has the same size and weight than center of mass would be average position of all balls.

enter image description here

In the picture above, in the first example with 3 objects, you can see that center of mass much closer to the two objects that form small cluster. The same idea with word vectors. Each word is an object and sentence (or tweet) is just a set of these objects. If many vectors from the tweet close to each other in space than the overall average will be close to this cluster and would be a good representation of the tweet.

One remark is that taking average can be the same as just summing vectors, because in most cases you will use cosine similarity for finding close vectors. And with cosine similarity, dividing vector by $n$ is the same as multiplying it by $1/n$ which is a scalar and scale of the vector doesn't matter if you measure distance using angles.