Solved – Low recall and high precision in text summarization

classificationmaximum-entropyrapidminertext mining

We are trying to generate a model to summarize Persian news. About 14000 news were summarized with help of humans(supervised) and then we extracted all sentences (about 180000) and labeled them (true if were selected in summarization, false if not). We have also calculated 9 features for these sentences (all features are between 0-1). Finally we used MaxEnt classifier (logistic regression) for binary classification on our data set.

And here is the result :
enter image description here

I really don't know low recall and high precision is normal for our classifier or something is wrong in our work? Can anyone come up with an explanation?

Best Answer

Users often prefer higher precision to higher recall. For example, you (and Google) really want the first 15 or 30 results for a search engine query to be accurate (=high precision), but neither of you are particularly concerned if you miss one or two of the millions on pages on the web (=low recall). Obviously, this depends a bit on the application, but I would imagine summarization is similar.

The bad news is that rapid miner has apparently computed precision and recall for each class separately, and it looks like your recall is quite bad (~13%) on the in-summary class, which is presumably more important than the not-in-summary sentences.

I suspect that this is because you have fairly unbalanced classes--it looks like there are ~3x more not-in-summary examples in your data set. You could potentially tweak the logistic regression's decision threshold (probably 0.5 by default) to give extra weight to the positive examples.