Solved – Softmax function for skipgram model

machine learningneural networkssoftmax

In a skipgram model, the output of the j node of the c output word is given by this formula:

In this context, what is $w_{I}$?

Best Answer

Given an input, word $w_I$, Skipgram learns the probability distribution of words which are likely to co-occur with it in a context window of a given size. The $j$'th node on the output layer gives the probability of observing word $w_j$ in word $w_I$'s context window.

You seem to have same strange notation in your formula. $u$ for example is referenced with both one and two subscripts.

I think this is a better way to see it:

Skip-gram models the probability of a word $w_o$ being observed within word $w_i$'s context window as:

$p(w_o | w_i) = y_o$

$y = Softmax(z)$

$z = W_i C^T$

where $W$ is the word vector matrix ($|V| \times d$) and $C$ is the context vector matrix ($|V| \times d$), given vocabulary of size |V|. $z$ is going to be a $|V|$ dimensional vector which corresponds to the dot product of the input word's vector $W_i$ and every context word vector. $y$ transforms this into a probability distribution using the $Softmax$, also a $|V|$ dimensional vector, which is indexed at position $o$ to get the probability of observing word $w_o$.

This makes it very clear that the goal is to align word and context vectors of words which tend to co-occur, and similarly to spread apart those of pairs of words which do not co-occur.

Hope that helps!

Best Answer

Related Solutions

Solved – Gradients for skipgram word2vec

Related Question