Solved – A single document as input to LDA

machine learningnatural languagetext miningtopic-models

We use topic modelling usually on a collection of documents – which makes the input. But what if I only have a single document where I want to see the underlying topics in it? I have heard that you can break them by paragraphs in cases like that, but what is the need for that? Does that mean I can't use latent dirichlet allocation (LDA) or it is not supposed to use with a single document as the input?

Best Answer

You can use a sentence splitter and split your document into sentences. I have never used the approach myself, but the tool is available with the open.nlp package in R, Python and Rapidminer.

What you could also do is to train a topicmodel on corpus with clearly defined topics. Next you use the same model on your one document and you see how the topic structure turn out.

Best Answer

Related Solutions

LDA – Natural Interpretation for LDA Hyperparameters

Solved – Using topic words generated by LDA to represent a document

Related Question