Solved – Dealing with sparse categories in binary cross-entropy

kerasneural networkssparse

In Keras, I'm using something similar to the Keras IMDB example to build a topic modelling example. However, unlike the example, which has a single "positive/negative" classification, I have
over a hundred topics which are not mutually exclusive. Every training example has a corresponding output which is a vector of zeros with 3 or 4 ones.
ex :[0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0 ….. 0]

model = Sequential()
model.add(Embedding(max_features, 128))
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(120, activation='sigmoid'))

# try using different optimizers and different optimizer configs
model.compile(loss='binary_crossentropy',
          optimizer='adam',
          metrics=['accuracy'])

print('Train...')
model.fit(x_train, y_train,
      batch_size=batch_size,
      epochs=15,
      validation_data=(x_test, y_test))
score, acc = model.evaluate(x_test, y_test,
                        batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)

Of course the model quickly jumps up to 95-97% accuracy, but when I look at the output, of course its predicting nothing but zeroes. Clearly the class imbalance (every class has more negative examples then positive examples is causing my predictions to stay at 0) is there a way to tweak the model to understand sparse binary examples?

Best Answer

I think the problem is the sigmoid activation function in your output layer. Binary crossentropy computes the sigmoid again as part of the loss computation (see the description in tensor flow: https://www.tensorflow.org/api_docs/python/tf/nn/sigmoid_cross_entropy_with_logits). Just changing the activation function in the output layer to linear worked in our (similarly structured) case.