Solved – How does LDA (Latent Dirichlet Allocation) assign a topic-distribution to a new document

latent-dirichlet-allocnatural languagenon-negative-matrix-factorizationtopic-models

I am new to topic modeling and read about LDA and NMF (Non-negative Matrix Factorization). I understand the training process work. Let's say I have 100 documents and I want to train an LDA for these documents with 10 topics. However, I don't really understand how does this model assign topic to an unseen document?

I used Gensim. After training, I have an LDA trained model and a dictionary with most frequent words. Let's say, I have an unseen new document with the following text:

This is just a test text about topic modeling and LDA. 

Can someone explain step by step how a topic distribution is assigned to this new document in terms of algorithmic steps? The same goes for NMF method.

Best Answer

What you should actually do is run inference (training) on the new set of documents (the old ones and the new ones together). A short-cut that estimates this well is applying Gibbs sampling only to the new documents while using the data obtained during training unchanged, as described by @SheldonCooper in Topic prediction using latent Dirichlet allocation.

Related Question