My objective is to implement a topic model for a large number of documents (20M or 30M). Let us assume that the number of topics is fixed at 50.
I think implementing an LDA for the above problem would not be difficult. However, I have yet to find an answer for an NMF model. I have read that it is NOT easy to implement an NMF model for a large number of documents.
Is it really not possible to implement an NMF model for my problem?
Best Answer
Note on implementing LDA for this problem: there are well-designed inference algorithms for huge numbers of documents. Specifically, you should check out "Online LDA", which can adaptively train the topics looking at small chunks of documents at a time.
Paper: http://www.cs.princeton.edu/~blei/papers/HoffmanBleiBach2010b.pdf
Matt Hoffman has python code available: http://www.cs.princeton.edu/~blei/topicmodeling.html