Solved – Calculating precision and recall for LDA

machine learningprecision-recalltopic-models

As I understand from Latent Dirichlet Allocation (LDA) algorithm, it produces two matrices- one for document-topic assignments, and one for topic-terms assignments. LDA is unsupervised machine learning algorithm, how could I calculate the accuracy of the output of that algorithm? Is there any common way to calculate precision and recall for topic assignments?

Best Answer

Since precision and recall necessarily depend on the notion of true classes for a datum, they can't be directly applied to an unsupervised method. One can evaluate clustering methods, but accuracy is not an applicable criterion.

To see how this pans out in the LDA case, consider running LDA on the AP corpus. In a supervised setting, what might your predicted feature(s) be? It could be as easy as the section of the paper it was drawn from, e.g. sports, world, politics. This is roughly what we humans might naturally think of when we hear "topic."

What about a piece on the front page of a local paper, mentioning a politician's appearance at a college sporting event? What if, instead of a game, the politician met with NCAA athletes to discuss student athlete compensation and healthcare. Are these sports, or politics? Must they be one or the other?

Point being, these answers require human decision-making based on the application. LDA reconciles the ambiguity by representing the article as a distribution over topics, and each topic as a distribution over words: It has no notion of whether a document belongs to one or several human-meaningful classes.

That said, one can certainly run an unsupervised method, transform data using that method, and subsequently examine how a supervised method performs on the transformed data. Indeed, comparing a given method's precision and recall when used on the same corpus transformed by LDA and TF-IDF would be fully sensible.

You can find a broad introduction to cluster validation here, and this paper from the comments details means of evaluating LDA, specifically.