Solved – a binary loss, and should I use a binary loss or a softmax loss for classification

cross entropyloss-functions

I was trying to understand the final section of the paper Revisiting Baselines for Visual Question Answering. The authors state that their model performs better with a binary loss in comparison to a softmax loss.

What is a binary loss (in this case)? Is the softmax loss a synonym for binary cross-entropy? Should I use a binary loss or a softmax loss for classification?

Best Answer

There is a nice explanation here

Binary Cross-Entropy Loss is also called Sigmoid Cross-Entropy loss. It is a Sigmoid activation plus a Cross-Entropy loss. Unlike Softmax loss it is independent for each vector component (class), meaning that the loss computed for every vector component is not affected by other component values.

The term binary stands for number of classes = 2.