I have coded a neural network with 1 hidden layer and 1 numerical output. No biases.
By appropiate choice of the activation function, I can easily approximate some continuous function.
However, even something as simple as
f(x) = 50 if x >= 100, and 25 otherwise
I cannot approximate using my neural net. I've tried changing learning rate, and also number of hidden notes, but it simply won't converge to it.
Why does it do so poorly at this incredibly simple function? What can I do to make it converge? It is not just this function, but an other similarly non-continuous function.
Best Answer
Wikipedia provides a synopsis of the universal approximation theorem.
This theorem is the core justification for attempting to model complex, nonlinear phenomena using neural networks. Even though it is very flexible, it doesn't cover everything -- in this case, you've defined a discontinuous function, and the universal approximation theorem only extends to continuous functions.
I am not aware of a theorem which allows a neural network to approximate arbitrary, discontinuous functions.
Perhaps if you treated either case of your target variable as a categorical outcome and used cross-entropy loss you would have success approximating the decision boundary between the two cases.