Machine Learning – LDA vs Word2Vec

latent-variablemachine learningnatural languageself-studyword2vec

I am trying to understand what is similarity between Latent Dirichlet Allocation and word2vec for calculating word similarity.

As I understand, LDA maps words to a vector of probabilities of latent topics, while word2vec maps them to a vector of real numbers (related to singular value decomposition of pointwise mutual information, see O. Levy,
Y. Goldberg, "Neural Word Embedding as Implicit Matrix Factorization"; see also How does word2vec work?).

I am interested both in theoretical relations (can one be considered a generalization, or variation of the other) and practical (when to use one but not the other).

What are some standard ways of computing the distance between documents? – DataScience.SE

Best Answer

An answer to Topic models and word co-occurrence methods covers the difference (skip-gram word2vec is compression of pointwise mutual information (PMI)).

So:

neither method is a generalization of another,
word2vec allows us to use vector geometry (like word analogy, e.g. $v_{king} - v_{man} + v_{woman} \approx v_{queen}$, I wrote an overview of word2vec)
LDA sees higher correlations than two-element,
LDA gives interpretable topics.

Some difference is discussed in the slides word2vec, LDA, and introducing a new hybrid algorithm: lda2vec - Christopher Moody.

Best Answer

Related Solutions

Solved – Understanding Singular Value Decomposition in the context of LSI

Solved – A single document as input to LDA

Related Question