I had the same problem understanding it. It seems that the output score vector will be the same for all C terms. But the difference in error with each one-hot represented vectors will be different. Thus the error vectors are used in back-propagation to update the weights.
Please correct me, if I'm wrong.
source : https://iksinc.wordpress.com/tag/skip-gram-model/
The answer to the question referenced by @amoeba in the comment on your question answers this quite well, but I would like to make two points.
First, to expand on a point in that answer, the objective being minimized is not the negative log of the softmax function. Rather, it is defined as a variant of noise contrastive estimation (NCE), which boils down to a set of $K$ logistic regressions. One is used for the positive sample (i.e., the true context word given the center word), and the remaining $K-1$ are used for the negative samples (i.e., the false/fake context word given the center word).
Second, the reason you would want a large negative inner product between the false context words and the center word is because this implies that the words are maximally dissimilar. To see this, consider the formula for cosine similarity between two vectors $x$ and $y$:
$$
s_{cos}(x, y) = \frac{x^Ty}{\|x\|_2\,\|y\|_2}
$$
This attains a minimum of $-1$ when $x$ and $y$ are oriented in opposite directions and equals $0$ when $x$ and $y$ are perpendicular. If they are perpendicular, they contain none of the same information, while if they are oriented oppositely, they contain opposite information. If you imagine word vectors in 2D, this is like saying that the word "bright" has the embedding $[1\;0],$ "dark" has the embedding $[-1\;0],$ and "delicious" has the embedding $[0\;1].$ In our simple example, "bright" and "dark" are opposites. Predicting that something is "dark" when it is "bright" would be maximally incorrect as it would convey exactly the opposite of the intended information. On the other hand, the word "delicious" carries no information about whether something is "bright" or "dark", so it is oriented perpendicularly to both.
This is also a reason why embeddings learned from word2vec perform well at analogical reasoning, which involves sums and differences of word vectors. You can read more about the task in the word2vec paper.
Best Answer
See Input vector representation vs output vector representation in word2vec
The original word2vec papers are notoriously unclear on some points pertaining to the training of the neural network (Why do so many publishing venues limit the length of paper submissions?). I advise you look at {1-4}, which answer this question.
References: