Solved – The tanh activation function in backpropagation

backpropagationmachine learningneural networks

In the backpropagation algorithm when the output activation function is tanh and the number of classes is 2 (binary problem), the value obtained at the output layer is in the range between -1 to 1. The cross-entropy error function has log that is applied on the predicted values. Therefore, if one of the output values is a negative number, an invalid operation, namely, log (non-positive number), occurs, rendering the cross-entropy function invalid.

This boils down to the following questions:

  • Is it disallowed to set the output activation as tanh?

  • Should the output activation always be the softmax even for a binary class
    problem?

Best Answer

Typically, The softmax function is used in case of multi class problems and a single logistic function for binary classification. The reason is that the output nonlinearity and the loss "match", that means that the derivative is very simple--a property of generalized linear models.

On a side note, the tanh and the logistic sigmoid are related linearly. Tanh is just the logistic scaled and translated from the $[0, 1]$ to the $[-1, 1]$ interval.