Solved – How does a classifier handle unseen documents that do not belong to any of the pre-existing classes

classificationmachine learningtext mining

I am doing text classification, and have been playing around with different classifiers. However I have a pretty basic question: what if a new unseen document comes in and it happens to not belong to any of the pre-existing classes? The classifiers that I have seen (in WEKA, libsvm etc.) still go ahead and put the unseen document in one of the existing classes anyway.

This situation comes up pretty frequently in my work. How can I handle it sensibly?

On a related note, is there a way I could get a sense of how confident a classifier is regarding it's classification decision?

================================Edit 1======================================

After reading the comments here, I think I am going to try one of the two things:

(1) Use the approach described in Zadrozny,Elkan paper that Steffen pointed me to, in order to get the probability estimates. If the probabilities are less than a magic threshold, I could then simply discard the unseen instance as noise.

(2) I am increasingly starting to think that I could instead handle this as n 1-class problems. let's say that I have n classes in my data. I could build and train n 1-class classifiers that give a yes/no decision (whether an instance belongs to a particular class or not). So that way when a new instance comes in, I could just pass it through each of there 1-class classifiers.

Thoughts? Any implementations/packages anyone is aware of that would let me do (2)?

===========================================================================

Best Answer

It sounds to me that the problem is one of "novelty detection", you want to identify test patterns of a type not seen in the training data. This can be achieved using the one-class support vector machine, which IIRC tries to construct a small volume in a kernel induced feature space that contains all (or a large fraction) of the training set, so any novel patterns encountered in operation are likely to fall outside this boundary. There are loads of papers on uses of one-class SVM indexed on Google Scholar, so it should be easy to find something relevant to your application.

Another approach would be to build a classifier by constructing a density estimator for each class, and then combine using Bayes rule for the classification. If the likelihood of a test observation for the winning class is low, you can then reject the pattern as a possible novelty.