Solved – Choice of neural net hidden activation function

classificationmachine learningneural networks

I have read elsewhere that one's choice of hidden layer activation function in a NN should be based on one's need, i.e. if you need values in the range -1 to 1 use tanh and use sigmoid for the range 0 to 1.

My question is how does one know what one's need is? Is it based on the range of the input layer, e.g. use the function that can encompass the input layer's full range of values, or somehow reflects the input layer's distribution (Gaussian function)? Or is the need problem/domain specific and one's experience/judgement is required to make this choice? Or is it simply "use that which gives the best cross-validated minimum training error?"

Best Answer

LeCun discusses this in Efficient Backprop Section 4.4. The motivation is similar to the motivation for normalizing the input to zero mean (Section 4.3). The average outputs of the tanh activation function are more likely to be close to zero than the sigmoid, whose average output must be positive.