Neural Networks – Why are Rectified Linear Units Considered Non-Linear?

deep learningneural networks

Why are activation functions of rectified linear units (ReLU) considered non-linear?

$$ f(x) = \max(0,x)$$

They are linear when the input is positive and from my understanding to unlock the representative power of deep networks non-linear activations are a must, otherwise the whole network could be represented by a single layer.

Best Answer

RELUs are nonlinearities. To help your intuition, consider a very simple network with 1 input unit $x$, 2 hidden units $y_i$, and 1 output unit $z$. With this simple network we could implement an absolute value function,

$$z = \max(0, x) + \max(0, -x),$$

or something that looks similar to the commonly used sigmoid function,

$$z = \max(0, x + 1) - \max(0, x - 1).$$

By combining these into larger networks/using more hidden units, we can approximate arbitrary functions.

$\hskip2in$RELU network function