Solved – Best way to train one-class SVM

novelty-detectionone-classscikit learnsvm

Let`s say I have training data which contains 10 classes and have build a classifier using this data.

When applying this classifier in real life it may encounter examples not belong to the classes in the training data. I want to build a novelty detector to reject these examples. I consider using one-class SVM from sklearn and have 2 options:

  • Using all training data as a positive class to train one-class SVM
  • Train 10 one-class SVM model, one for each class in training data

Which way is better and why?

Best Answer

If the goal is to determine for new samples whether you can apply the classifier that you've already built, then the correct answer is to use a one-class SVM (as implemented here). A one-vs-all scheme, which your second idea is known as, is useful in order to extend binary classifiers to a multi-task setting, but they are incapable of telling you if you're getting outliers.

edit: As for the why -- it may be the case that there are features of your data in these 10 classes that makes them unique from the unobserved classes, but are not predictive of any single class. These would be missed by the one-vs-all scheme, but would be taken into account (depending on things like the kernel you use) by a one-class SVM.