The Multi-label algorithm accepts a binary mask over multiple labels. So, for example, you could do something like this:
data = [
[[0.1 , 0.6, 0.0, 0.3], 1, 10, 0, 0, 0],
[[0.7 , 0.3, 0.0, 0.0], 0, 7, 22, 0, 0],
[[0.0 , 0.0, 0.6, 0.4], 0, 0, 6, 0, 20],
#...
]
X = np.array([d[1:] for d in data])
yvalues = np.array([d[0] for d in data])
# Create a binary array marking values as True or False
from sklearn.preprocessing import MultiLabelBinarizer
Y = MultiLabelBinarizer().fit_transform(yvalues)
clf = OneVsRestClassifier(SVC(kernel='poly'))
clf.fit(X, Y)
clf.predict(X) # predict on a new X
The result for each prediction will be an array of 0s and 1s marking which class labels apply to each row input sample.
Given your data, though, I'm not sure this is what you want to do. For example, the third point has zero listed twice, which makes me think that you're not predicting multiple labels in an unordered OneVsRest
manner, but actually predicting multiple ordered columns of labels: in that case, it might make sense to do a separate classification for each, e.g.
X = np.array([d[1:] for d in data])
Y = np.array([d[0] for d in data])
clfs = [SVC().fit(X, Y[:, i]) for i in range(Y.shape[1])]
Ypred = np.array([clf.predict(X) for clf in clfs]).T
With other classifiers, such as RandomForestClassifier
, you can do this column-by-column prediction in one operation: e.g.
X = np.array([d[1:] for d in data])
Y = np.array([d[0] for d in data])
RandomForestClassifier().fit(X, Y).predict(X)
Of course, the array passed to predict
should be on something different than the array passed to fit
, but hopefully this makes the distinction clear.
In your scenario, you should treat each label as an independent label prediction and use it in a separate cross entropy, whose results you sum.
In practice, after your second-to-last layer you create five splits, each leading to its own three-neuron sub-layer with a softmax activation, each of which will give you a categorical output on which you apply cross entropy.
Best Answer
Tensorflow has a loss function
weighted_cross_entropy_with_logits
, which can be used to give more weight to the 1's. So it should be applicable to a sparse multi-label classification setting like yours.From the documentation:
If you use the tensorflow backend in Keras, you can use the loss function like this (Keras 2.1.1):
Then in your model:
I have not found many resources yet which report well working values for the
pos_weight
in relation to the number of classes, average active classes, etc.