I'm trying to work on an anomaly detection problem, so I am currently exploring my options on which algorithm is best to use for me.
I've been looking at the one-class SVM in the scikit-learn library for Python. It has a parameter nu
that you pass it, which roughly determines what percentage of training data it is allowed to mislabel as an anomaly.
This might simply be a lack of understanding of something fundamental about the the one-class SVM, but I am wondering why I can't set the nu
parameter to zero. In my application, it is more important for me to make sure I don't label something as an anomaly rather than to miss an actual anomaly.
Do I simply need more data so that I can set a very low nu
?
Best Answer
As @Joe already mentioned, $\nu$:
Mathematically, the quadratic programming minimization function is:
So if $\nu$ is too small, the problem is becoming a hard margin algorithm (2nd item is infinity). And the algorithm will find the unique supporting hyperplane with the properties that it separates all data from the origin, and its distance to the origin is maximal among all such hyperplanes, which, as you said, a 100% training accuracy. You can try to set
nu
a small value rather than0
. Perhaps the package doesn't allow the -Inf occurrence in the cost function.