Solved – LDA vs. labeled LDA

data mininglatent-dirichlet-allocmachine learningnatural languagetopic-models

I have gone through the techniques and understood the basic ideas. But I want to know which one usually is expected to work better, LDA or Labeled LDA? What are the features of the dataset that help decide amongst the two?

Best Answer

I think it is not really a question of better and worse but what data you have available and interpretability. If the data you have is at least partially labeled, whether with something like traditional topic classes, or something like hashtags, then labeled LDA may be interesting to pursue, otherwise not. Doing labeled LDA is certain to mean that the induced classes correspond well with human categorization, as represented by the provided label space. This is often very useful for applications with a human in the loop. It's not the same thing as maximizing data likelihood, though.

But, in summary, if you've got some labelings, and human interpretability of the clusters is important to you, then Labeled LDA will likely produce better results.