Solved – how does the loss function work in word2vec

likelihoodloss-functionsnatural languageprobability

I was watching CS224n and I Came across this equation for word2vec loss function.
As in the blue box, "for each document\training example t we are calculating the probability of context words given the current word". I wanted to know why we are multiplying the probabilities as in the red boxes. I might be missing out on some math, it would be great if someone can help me. Thanks.

Best Answer

The probabilities are being multiplied because you want to compute the probability of two (or more) events happening at the same time, which is equal to the product of the probabilities of the individual events, under the assumption that the events are independent. I highly recommend you to check basic Wikipedia articles on Maximum Likelihood before to continue, so that you understand the general mechanism.

Best Answer

Related Solutions

Solved – the relation of the negative sampling (NS) objective function to the original objective function in word2vec

Neural Networks – Understanding YOLO and Its Loss Function

Related Question