Solved – Does a Neural Network actually need an activation function or is that just for Back Propagation

I have a feed forward neural network (1 hidden layer with 10 neurons, 1 output layer with 1 neuron) with no activation function (only transfer by weight + bias) that can learn a really wonky sin wave (using a 2in1out window) with production usable accuracy trained via stochastic climbing in a couple seconds:

for (int d = 0; d < 10000; d++)
    data.Add((float)(Math.Sin((float)d * (1 / (1 + ((float)d / 300)))) + 1) / 2);

I'm probably just drunk, but if you don't use an activation function do you lose that universal function approximator status? Or is it just for gradient descent / back propagation etc. to act as a differentiable function?

Alternatively, have I probably just overlooked a bug and am actually secretly activating without knowing it?

source in C# (draws on a form)

Best Answer

You built a multilayer neural network with a linear hidden layer. Linear units in the hidden layer negates the purpose of having a hidden layer. The weights between your inputs and the hidden layer, and the weights between the hidden layer and the output layer are effectively a single set of weights. A neural network with a single set of weights is a linear model performing regression.

Here's a vector of your linear hidden units $$ H = [h_1, h_2,.. ,h_n] $$

The equation the governs the forward propagation of $x$ through your network is then $$ \bar{y} = W'(Hx) \Rightarrow (W'H)x $$ Thus an n-layered feed forward neural network with linear hidden layers is equivalent to a output layer given by $$ W=W'\prod_i H_i $$

If you only have linear units then the hidden layer(s) are doing nothing. Hinton et al recommends rectified linear units, which are $\text{max}(0, x)$. It's simple and doesn't suffer the vanishing gradient problem of sigmoidal functions. Similarly you might choose soft-plus function, $\log(1 + e^x)$ which is a non-sparse smooth approximation.

Best Answer

Related Solutions

Related Question