I am curious, what does subtracting vectors, as in [man – woman] do in regards to Google's word2vec calculation of analogy ? Is this a measure of how different the two vectors are? So is
man – woman (approx.)= king – queen
saying the difference between man and woman is (approximately) the same as the difference between king and queen?
Best Answer
Yes, that's my understanding of their interpretation; that's the reasoning behind why you'd expect (as observed) that
[man] - [woman] + [king] ≈ [queen]
, or[Paris] - [France] + [China] ≈ [Beijing]
.The idea is perhaps that vectors are approximately sums of their semantic components, so that
[king]
includes a "male" component as well as "ruler", "person", and whatever else, and[queen]
has basically the same set of components except it has "female" instead of "male".[man] - [woman]
would then end up at["male"] - ["female"]
, so adding it to[king]
would just swap the "male" concept for "female".I kind of doubt there's a more complete understanding of it than that, though I'm not familiar with all of the literature on the subject and someone may have studied it in more detail.