Solved – the function that is being optimized in word2vec

word2vec

The following question is about Skipgram, but it would be a plus (though not essential) to answer the question for the CBOW model as well.

Word2Vec uses neural networks, and neural networks learn by doing gradient descent on some objective function. So my question is:

  • How are the words inputted into a Word2Vec model? In other words, what part of the neural network is used to derive the vector representations of the words?
  • What part of the neural network are the context vectors pulled from?
  • What is the objective function which is being minimized?

Best Answer

How are the words inputted into a Word2Vec model? In other words, what part of the neural network is used to derive the vector representations of the words?

See Input vector representation vs output vector representation in word2vec

What is the objective function which is being minimized?

The original word2vec papers are notoriously unclear on some points pertaining to the training of the neural network (Why do so many publishing venues limit the length of paper submissions?). I advise you look at {1-4}, which answer this question.


References:

Related Question