My question may be a silly one. So I shall apologize in advance.
I was trying to use the GLOVE model pre-trained by Stanford NLP group (link). However, I noticed that my similarity results showed some negative numbers.
That immediately prompted me to look at the word-vector data file. Apparently, the values in the word vectors were allowed to be negative. That explained why I saw negative cosine similarities.
I am used to the concept of cosine similarity of frequency vectors, whose values are bounded in [0, 1]. I know for a fact that dot product and cosine function can be positive or negative, depending on the angle between vector. But I really have a hard time understanding and interpreting this negative cosine similarity.
For example, if I have a pair of words giving similarity of -0.1, are they less similar than another pair whose similarity is 0.05? How about comparing similarity of -0.9 to 0.8?
Or should I just look at the absolute value of minimal angle difference from $n\pi$? Absolute value of the scores?
Many many thanks.
Best Answer
Let two vectors $a$ and $b$, the angle $θ$ is obtained by the scalar product and the norm of the vectors :
$$ cos(\theta) = \frac{a \cdot b}{||a|| \cdot ||b||} $$
Since the $cos(\theta)$ value is in the range $[-1,1]$ :
Example : Let two user $U_1$ and $U_2$, and $sim(U_1, U_2)$ the similarity between these two users according to their taste for movies: