Solved – Varied Activation Functions in Neural Networks

neural networks

I have been told by a professor that it is not possible to combine different activation functions within a neural network. And I can't find any examples of anyone doing this. However, I cannot find any good explanation why.

Conceptually, it seems to make sense: suppose I have a set of data on some population. Imagine the population is divided into two types of people, A and B, but I don't actually have that categorical variable in my data. We could imagine a logistic model that maps my population data into the categorical variable.

Now suppose I'm training a neural net with one hidden layer. It seems to me that it makes perfect sense for one node in that hidden layer to have a sigmoid function as input representing the transformation of the input variables to this latent categorical variable (now as a probability, of course). Meanwhile, all the other nodes have a linear activation function. And then output would be a linear function of all the nodes in the hidden layer.

I have no reason to assume this would improve prediction error. This is what I want to know: is it possible to estimate such a model using standard approaches?

Best Answer

Clearly you can use different activations in a neural network. An MLP with any activation and a softmax readout layer is one example (for example, multi-class classification). An RNN with LSTM units has at least two activation functions (logistic, tanh and any activations used elsewhere). ReLU activations in the hidden layers and a linear function in the readout layer for a regression problem.