Solved – Optimise SVM to avoid false-negative in binary classification

cross-validationmachine learningpythonscikit learnsvm

I am training an SVM binary classifier using Scikit learn.

Due to the nature of my problem I need to avoid false negatives. As nothing is free I am okay getting a higher rate of false positives in order to reduce the number of false negatives. How can we do that (ideally with Scikit learn)?

In other words, how can we minimise false negatives using an SVM classifier ? Is there some way to tune hyper parameters in order to favor false positives over false negatives?

Best Answer

Scikit learn implementation of the SVM binary classifier does not let you set a cutoff threshold as the other comments/replies have suggested. Instead of giving class probabilities, it straighaway applies a default cutoff to give you the class membership e.g. 1 or 2.

To minimize false negatives, you could set higher weights for training samples labeled as the positive class, by default the weights are set to 1 for all classes. To change this, use the hyper-parameter class_weight .

Ideally, you should avoid choosing a cutoff and simply provide the class probabilities to the end users who can then decide on which cutoff to apply when making decisions based on the classifier.

A better metric to compare classifiers is a proper scoring function, see https://en.wikipedia.org/wiki/Scoring_rule and the score() method in the svm classifier module sklearn.svm.SVC.