Neural Networks – Can Neural Networks Work With Negative and Zero Inputs? How to Handle Them

activation functionneural networksrelu

As the title suggests, I have several features which have values of either -1, 0 or 1. If I feed this data into a neural network where I use ReLu as the activation function for the hidden layers, would the negative and 0 values pose a problem to the NN?

I have heard about dead neurons where using ReLu which is a stepwise function, causes any inputs less than or equal to 0 the neuron to stop learning and become dead. So naturally if a NN with activation function ReLu is fed 0 or negative inputs, those neurons will become dead.

Now my data contains several features with 0 and negative values. What to do in such a case? Should I use LeakyReLu or some other variation of ReLu? Or should I transform my data such that only positive values remain?

EDIT 1: If the negative and 0 inputs do not cause dead neurons then what causes dead neurons? Also then why do we have activation functions like LeakyReLu, PReLu, ELU if ReLu alone can handle dead neurons?

Best Answer

  • (this has been said already by other answers) ReLu activation function has a gradient equal to 0 (hence it stops learning) when the linear combination of the inputs is less than 0, not when the input themselves are 0
  • (this also has been said already by other answers) the inputs of NNs are often normalised around 0, so it's totally normal that some values are 0, also the weights of each neuron are usually (randomly) initialised around 0, meaning that when it comes the time to compute the linear combination, some random input values will switch sign, this is expected
  • ReLu function is actually designed to result in a null gradient for values below 0, don't stress out about this, the problem of dead neurons comes up when all inputs for that neuron result in a null gradient. It's not a trivial problem to discuss so I will slide upon it, but it has nothing to do with simply having some negative values in the inputs. As HitLuca has pointed in his comment, having the neuron parameters go to zero during the learning process will cause the neuron to die.
  • Of course other activation functions that never result in a null gradient (like leaky ReLu) will avoid dead neurons entirely.
Related Question