I was trying to understand the final section of the paper Revisiting Baselines for Visual Question Answering. The authors state that their model performs better with a binary loss in comparison to a softmax loss.
What is a binary loss (in this case)? Is the softmax loss a synonym for binary cross-entropy? Should I use a binary loss or a softmax loss for classification?
Best Answer
There is a nice explanation here
The term binary stands for number of classes = 2.