Solved – The general approaches for improving a SVM-based classifier which is low precision and high recall

data mininglibsvmmachine learningsvmtext mining

I built a SVM-based classifier against a data set, the precision is about 66% and the recall is about 88%. Generally, what are the options to tune the parameter that can increase the precision?

Best Answer

I've used the approach described in this paper to some success: Cohen, 2006

Although the data used in this paper is specific to the biomedical literature, the approach gets at a larger issue in machine learning--that of the trade-off between precision and recall. In biomedicine, we generally deal with highly skewed data (i.e., one or more rare classes, and one prominent class), which is sometimes the source of the results you're describing. For example, if I'm classifying 100 data points, 95 of which belong to class A, and 5 of which belong to class B, many machine learning algorithms (SVM included) will just classify everything/most things as class A, yielding great recall but awful precision.