Solved – Neural Network: For Binary Classification use 1 or 2 output neurons

classificationmachine learningneural networks

Assume I want to do binary classification (something belongs to class A or class B). There are some possibilities to do this in the output layer of a neural network:

  • Use 1 output node. Output 0 (<0.5) is considered class A and 1 (>=0.5) is considered class B (in case of sigmoid)

  • Use 2 output nodes. The input belongs to the class of the node with the highest value/probability (argmax).

Are there any papers written which (also) discuss this? What are specific keywords to search on?

This question is already asked before on this site e.g. see this link with no real answers. I need to make a choice (Master Thesis), so I want to get insight in the pro/cons/limitations of each solution.

Best Answer

In the second case you are probably writing about softmax activation function. If that's true, than the sigmoid is just a special case of softmax function. That's easy to show.

$$ y = \frac{1}{1 + e ^ {-x}} = \frac{1}{1 + \frac{1}{e ^ x}} = \frac{1}{\frac{e ^ x + 1}{e ^ x}} = \frac{e ^ x}{1 + e ^ x} = \frac{e ^ x}{e ^ 0 + e ^ x} $$

As you can see sigmoid is the same as softmax. You can think that you have two outputs, but one of them has all weights equal to zero and therefore its output will be always equal to zero.

So the better choice for the binary classification is to use one output unit with sigmoid instead of softmax with two output units, because it will update faster.