I am trying to understand what is similarity between Latent Dirichlet Allocation and word2vec for calculating word similarity.
As I understand, LDA maps words to a vector of probabilities of latent topics, while word2vec maps them to a vector of real numbers (related to singular value decomposition of pointwise mutual information, see O. Levy,
Y. Goldberg, "Neural Word Embedding as Implicit Matrix Factorization"; see also How does word2vec work?).
I am interested both in theoretical relations (can one be considered a generalization, or variation of the other) and practical (when to use one but not the other).
Related:
Best Answer
An answer to Topic models and word co-occurrence methods covers the difference (skip-gram word2vec is compression of pointwise mutual information (PMI)).
So:
Some difference is discussed in the slides word2vec, LDA, and introducing a new hybrid algorithm: lda2vec - Christopher Moody.