Solved – How does a one-class SVM model work

intuitionone-classoutlierssvm

I am working on a problem involving outliers detection and I found that it was possible to perform this using one-class SVM. I have been googling it and reading some blogs and papers, but I have a doubt it seems not to be solved elsewhere.

As far as I have read, this paper originated the idea of one class SVM, and in here it is said that the idea is to map the data into the feature space and to separate it from the origin with maximum margin. However, in other papers and blogs I read that the intuitive idea is to build an hyperplane as small as possible containing all the data, so the outliers fall on the other side of the hyperplane.

In my opinion, these descriptions (maximizing the distance to the origin on one side, but making the hyperplane as small as possible on the other side)
are contradictory. What am I misunderstanding?

Best Answer

The idea is to have an estimate of he support of the (unknown) probability distribution from which the samples have been obtained.

Concretely, you would like to have a threshold, $\delta$, so that only you only consider points which are not too unlikely to be encountered. Now, how can we express it in a geometrical sense?.

Given a set of i.i.d. samples, you find the smallest enclosing hypersphere that contains those points (because you consider them to be "OK" points) allowing for some outliers/errors (soft margin SVM). The formulation in the primal is, $$ \text{min}_{R,c,\xi} \text{ }R^{2} + \frac{1}{\nu l} \sum_{i}^{l} \xi_{i} $$ subject to, $$ || \phi(x_{i})-c||^{2} \leq R^{2} + \xi_{i}, \xi_{i} \ge 0 $$ where $R$ is the radius of the sphere, and $\nu \in (0,1)$, and $c$ is the center of the sphere.

There is another paper by those same authors where it is explained in more detail and more clearly (Estimating the support of a high-dimensional distribution).

Related Question