Solved – required for neural network to approximate discontinuous function

neural networks

I have coded a neural network with 1 hidden layer and 1 numerical output. No biases.

By appropiate choice of the activation function, I can easily approximate some continuous function.

However, even something as simple as

f(x) = 50 if x >= 100, and 25 otherwise

I cannot approximate using my neural net. I've tried changing learning rate, and also number of hidden notes, but it simply won't converge to it.

Why does it do so poorly at this incredibly simple function? What can I do to make it converge? It is not just this function, but an other similarly non-continuous function.

Best Answer

Wikipedia provides a synopsis of the universal approximation theorem.

In the mathematical theory of artificial neural networks, the universal approximation theorem states that a feed-forward network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of $\mathbb{R}^n$, under mild assumptions on the activation function.

This theorem is the core justification for attempting to model complex, nonlinear phenomena using neural networks. Even though it is very flexible, it doesn't cover everything -- in this case, you've defined a discontinuous function, and the universal approximation theorem only extends to continuous functions.

I am not aware of a theorem which allows a neural network to approximate arbitrary, discontinuous functions.


Perhaps if you treated either case of your target variable as a categorical outcome and used cross-entropy loss you would have success approximating the decision boundary between the two cases.

Related Question