Sentiment Analysis – Has State-of-the-Art Performance of Paragraph Vectors for Sentiment Analysis Been Replicated

natural languagereproducible-researchsentiment analysistext miningword embeddings

I was impressed by the results in the ICML 2014 paper "Distributed Representations of Sentences and Documents" by Le and Mikolov. The technique they describe, called "paragraph vectors", learns unsupervised representations of arbitrarily-long paragraphs/documents, based on an extension of the word2vec model. The paper reports state-of-the-art performance on sentiment analysis using this technique.

I was hoping to evaluate this technique on other text classification problems, as an alternative to the traditional bag-of-words representation. However, I ran across a post by the second author in a thread in the word2vec Google group that gave me pause:

I tried myself to reproduce Quoc's results during the summer; I could get error rates on the IMDB dataset to around 9.4% – 10% (depending on how good
the text normalization was). However, I could not get anywhere close
to what Quoc reported in the paper (7.4% error, that's a huge
difference) … Of course we also asked Quoc about the code; he
promised to publish it but so far nothing has happened. … I am starting
to think that Quoc's results are actually not reproducible.

Has anyone had success reproducing these results yet?

Best Answer

Footnote at http://arxiv.org/abs/1412.5335 (one of the authors is Tomas Mikolov) says

In our experiments, to match the results from (Le & Mikolov, 2014), we followed the suggestion by Quoc Le to use hierarchical softmax instead of negative sampling. However, this produces the 92.6% accuracy result only when the training and test data are not shuffled. Thus, we consider this result to be invalid.

Related Question