Solved – Neural network for multi label classification with large number of classes outputs only zero

classificationmachine learningmulti-classmultilabelneural networks

I am training a neural network for multilabel classification, with a large number of classes (1000). Which means more than one output can be active for every input. On an average, I have two classes active per output frame. On training with a cross entropy loss the neural network resorts to outputting only zeros, because it gets the least loss with this output since 99.8% of my labels are zeros. Any suggestions on how I can push the network to give more weight to the positive classes?

Best Answer

Tensorflow has a loss function weighted_cross_entropy_with_logits, which can be used to give more weight to the 1's. So it should be applicable to a sparse multi-label classification setting like yours.

From the documentation:

This is like sigmoid_cross_entropy_with_logits() except that pos_weight, allows one to trade off recall and precision by up- or down-weighting the cost of a positive error relative to a negative error.

The argument pos_weight is used as a multiplier for the positive targets

If you use the tensorflow backend in Keras, you can use the loss function like this (Keras 2.1.1):

import tensorflow as tf
import keras.backend.tensorflow_backend as tfb

POS_WEIGHT = 10  # multiplier for positive targets, needs to be tuned

def weighted_binary_crossentropy(target, output):
    """
    Weighted binary crossentropy between an output tensor 
    and a target tensor. POS_WEIGHT is used as a multiplier 
    for the positive targets.

    Combination of the following functions:
    * keras.losses.binary_crossentropy
    * keras.backend.tensorflow_backend.binary_crossentropy
    * tf.nn.weighted_cross_entropy_with_logits
    """
    # transform back to logits
    _epsilon = tfb._to_tensor(tfb.epsilon(), output.dtype.base_dtype)
    output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
    output = tf.log(output / (1 - output))
    # compute weighted loss
    loss = tf.nn.weighted_cross_entropy_with_logits(targets=target,
                                                    logits=output,
                                                    pos_weight=POS_WEIGHT)
    return tf.reduce_mean(loss, axis=-1)

Then in your model:

model.compile(loss=weighted_binary_crossentropy, ...)

I have not found many resources yet which report well working values for the pos_weight in relation to the number of classes, average active classes, etc.

Related Solutions

Solved – scikit multi label classification

The Multi-label algorithm accepts a binary mask over multiple labels. So, for example, you could do something like this:

data = [
        [[0.1 , 0.6, 0.0, 0.3], 1, 10, 0, 0, 0],
        [[0.7 , 0.3, 0.0, 0.0], 0, 7, 22, 0, 0],
        [[0.0 , 0.0, 0.6, 0.4], 0, 0, 6, 0, 20],
        #...
       ]

X = np.array([d[1:] for d in data])
yvalues = np.array([d[0] for d in data])

# Create a binary array marking values as True or False
from sklearn.preprocessing import MultiLabelBinarizer
Y = MultiLabelBinarizer().fit_transform(yvalues)

clf = OneVsRestClassifier(SVC(kernel='poly'))
clf.fit(X, Y)
clf.predict(X)  # predict on a new X

The result for each prediction will be an array of 0s and 1s marking which class labels apply to each row input sample.

Given your data, though, I'm not sure this is what you want to do. For example, the third point has zero listed twice, which makes me think that you're not predicting multiple labels in an unordered OneVsRest manner, but actually predicting multiple ordered columns of labels: in that case, it might make sense to do a separate classification for each, e.g.

X = np.array([d[1:] for d in data])
Y = np.array([d[0] for d in data])
clfs = [SVC().fit(X, Y[:, i]) for i in range(Y.shape[1])]
Ypred = np.array([clf.predict(X) for clf in clfs]).T

With other classifiers, such as RandomForestClassifier, you can do this column-by-column prediction in one operation: e.g.

X = np.array([d[1:] for d in data])
Y = np.array([d[0] for d in data])
RandomForestClassifier().fit(X, Y).predict(X)

Of course, the array passed to predict should be on something different than the array passed to fit, but hopefully this makes the distinction clear.

Solved – Loss function and activation function for categorical AND multi-label classification in neural network

In your scenario, you should treat each label as an independent label prediction and use it in a separate cross entropy, whose results you sum.

In practice, after your second-to-last layer you create five splits, each leading to its own three-neuron sub-layer with a softmax activation, each of which will give you a categorical output on which you apply cross entropy.

Best Answer

Related Solutions

Solved – scikit multi label classification

Solved – Loss function and activation function for categorical AND multi-label classification in neural network

Related Question