Solved – Neural Networks: What activation function should I choose for hidden layers in regression models

neural networksregression

I am experimenting with Neural networks for regression tasks. I know some theory, and how to choose the activation function for the output layer.
What is not clear to me is: how to choose activation for hidden layers?

Best Answer

With respect to choosing hidden layer activations, I don't think that there's anything about a regression task which is different from other neural network tasks: you should use nonlinear activations so that the model is nonlinear (otherwise, you're just doing a very slow, expensive linear regression), and you should use activations that are easy to train (ReLU or similar).

Recent research has found that ReLU and similar activations (ELU, Leaky ReLU, etc.) work very well because they allow researchers to build deep networks which do not suffer from vanishing or exploding gradient for positive inputs. See:

On the left, ReLU has derivative 0 and this can lead to the "dead ReLU" phenomenon. So I prefer using ELU or LeakyReLU units, which can be more robust to that problem.