Solved – Automatic keyword generation evaluation

text mining

I have a simple Text analyzer which generates keywords for a given input text. Till now I have been doing a manual evaluation of it, i.e., manually selecting keywords of a text and comparing them against the ones generated by the analyzer.

Is there any way in which I can automate this? I tried googling a lot for some free keyword generators which can help in this evaluation but have not found any till now. I would appreciate any suggestions on how to go about this.

Best Answer

There are several ways to evaluate keywords ...

Stand alone (evaluating only one generator at a time)

According to wikipedia (Index Term as a synonym for keyword in Information Retrieval), a keyword is

(a) term that captures the essence of the topic of a document

which can be either mean that the term is a summary which does not appear in the document (hard for machines) or a term, which (maybe in variations) appears often in the document (easy for machines), but not too often (so that it might be a common word like "and"). A commonly used method here is the TF-IDF-score.

But what means "often" and "too often" ? This is unclear ... it is in the eye of the beholder ... and exactly the reason why this sort of standalone validation is not possible

Comparing the output of two keyword generators

... for the same set of documents. Assuming that you trust one of the generator and hence use it as reference, you can calculate the overlap using e.g. Jaccard Index.

As a result, the keywords of your generator are as valid as the one from the reference generator, but not necessary valid or useful per se.

Evaluating the keyword relevance for an application

... to illustrate the issue why standalone validation is not possible.

Suppose you have two documents, each containing the following words (among useless others)

  • document A: love, feeling
  • document B: hate, feeling

and 100000 more documents all about statistics where neither of both words does appear.

Now you have to pick one, only one. Which one is the best ? It depends ...

  • If you want to cluster the documents according to their topic, you have to use feeling.
  • If you want to create a sentiment classifer, which labels all documents as positive, negative or neutral, you have to use love and hate, because otherwise you cannot distinguish both.

In summary one can easy evaluate whether a set of keywords is useful for an application, may it be a sentiment classifer, a spam detector or a search engine. But it is not said that a keyword useful for one application is useful for another one, too.

Update

Seems to be a rule of the internet: Everything you can think of is probably already a research discipline: Terminology Extraction.