Solved – Libsvm one-class svm: how to consider all data to be in-class

libsvmMATLABsvm

I am using Libsvm for Matlab.

I would like to construct the model for a full circumscription of all training data (in the higher SVM-space). For this I assume all my training data is correct and has no outliers.

I generate random distributed data (which is likely to resemble my real-world data) and train an one-class SVM for it. When I predict the labels of that same data almost all the data points that are used as Support Vector are also considered to be outside the class. Is that correct behavior?

How can I construct a model for a SVM which considers all data to be in-class?

The following code gives an example. In the resulting scatterplot the blue circles are all the data points, the reds are support vectors used by the model and the green circles are for points that are outside. So, the empty red circles are support vectors but out-of-class.

I have tried to adjust the nu parameter (-n 0.5 as default), but that only changes the ratio of data points / support vectors. The support vectors are still most out-class.

data = normrnd(0,1,1000,2);
labels = ones(length(data),1);

% Construct one-class SVM with RDF kernel (Gaussian)
model = svmtrain(labels, data, '-s 2 -t 2');

% Use the same data for label prediction
[predicted_labels] = svmpredict(labels, data, model);
inside_indices = find(predicted_labels > 0);

figure; hold on;
% Scatterplot of all data, blue circles
scatter(data(:,1), data(:,2), 30, 'blue');

% Scatterplot of all support vectors, small red circles
scatter(model.SVs(:,1), model.SVs(:,2), 20, 'red');

% Scatterplot of all data inside the one-classs, small green circles
scatter(data(inside_indices,1), data(inside_indices,2), 10, 'green');

The resulting scatterplot: scatterplot example

Edit: I may have found a solution; the LIBSVM Tools contains an extension for "Support Vector Data Description", for "finding the smallest sphere containing all data": http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#libsvm_for_svdd_and_finding_the_smallest_sphere_containing_all_data

Edit2: Using the SVDD tool does make make any difference. I train with -s 5 but still I only get around 50% accuracy for the same data set.

My question still holds; How can I describe all data with an one-class SVM?

Best Answer

If you set the nu parameter to very small (-n 0.001) and set the gamma to small as well (-g 0.001) you will get almost all your training data being in your class.