The usual approach is to use Platt's method of fitting a univariate logistic regression model to the output of the SVM. However, if you want a probabilistic output, it is probably better to go for kernel logistic regression, which estimates the probabilities directly, rather than training a discriminative classifier and post-processing the output.
Gaussian process classification would also be another method that may be better suited, see the excellent book by Rasmussen and Williams, and the equally excellent MATLAB toolbox that goes with it.
The Multi-label algorithm accepts a binary mask over multiple labels. So, for example, you could do something like this:
data = [
[[0.1 , 0.6, 0.0, 0.3], 1, 10, 0, 0, 0],
[[0.7 , 0.3, 0.0, 0.0], 0, 7, 22, 0, 0],
[[0.0 , 0.0, 0.6, 0.4], 0, 0, 6, 0, 20],
#...
]
X = np.array([d[1:] for d in data])
yvalues = np.array([d[0] for d in data])
# Create a binary array marking values as True or False
from sklearn.preprocessing import MultiLabelBinarizer
Y = MultiLabelBinarizer().fit_transform(yvalues)
clf = OneVsRestClassifier(SVC(kernel='poly'))
clf.fit(X, Y)
clf.predict(X) # predict on a new X
The result for each prediction will be an array of 0s and 1s marking which class labels apply to each row input sample.
Given your data, though, I'm not sure this is what you want to do. For example, the third point has zero listed twice, which makes me think that you're not predicting multiple labels in an unordered OneVsRest
manner, but actually predicting multiple ordered columns of labels: in that case, it might make sense to do a separate classification for each, e.g.
X = np.array([d[1:] for d in data])
Y = np.array([d[0] for d in data])
clfs = [SVC().fit(X, Y[:, i]) for i in range(Y.shape[1])]
Ypred = np.array([clf.predict(X) for clf in clfs]).T
With other classifiers, such as RandomForestClassifier
, you can do this column-by-column prediction in one operation: e.g.
X = np.array([d[1:] for d in data])
Y = np.array([d[0] for d in data])
RandomForestClassifier().fit(X, Y).predict(X)
Of course, the array passed to predict
should be on something different than the array passed to fit
, but hopefully this makes the distinction clear.
Best Answer
A likely cause is the fact you are not tuning your model. You need to find good values for $C$ and $\gamma$. In your case, the defaults turn out to be bad, which leads to trivial models that always yield a certain class. This is particularly common if one class has much more instances than the others. What is your class distribution?
scikit-learn has limited hyperparameter search facilities, but you can use it together with a tuning library like Optunity. An example about tuning scikit-learn SVC with Optunity is available here.
Disclaimer: I am the lead developer of Optunity.