Solved – Topic modeling, LDA and NMF

matrixmodelingtopic-models

My objective is to implement a topic model for a large number of documents (20M or 30M). Let us assume that the number of topics is fixed at 50.

I think implementing an LDA for the above problem would not be difficult. However, I have yet to find an answer for an NMF model. I have read that it is NOT easy to implement an NMF model for a large number of documents.

Is it really not possible to implement an NMF model for my problem?

Best Answer

Note on implementing LDA for this problem: there are well-designed inference algorithms for huge numbers of documents. Specifically, you should check out "Online LDA", which can adaptively train the topics looking at small chunks of documents at a time.

Paper: http://www.cs.princeton.edu/~blei/papers/HoffmanBleiBach2010b.pdf

Matt Hoffman has python code available: http://www.cs.princeton.edu/~blei/topicmodeling.html