Solved – My neural network can’t even learn Euclidean distance

euclideankerasmachine learningneural networksoptimization

So I'm trying to teach myself neural networks (for regression applications, not classifying pictures of cats).

My first experiments were training a network to implement an FIR filter and a Discrete Fourier Transform (training on "before" and "after" signals), since those are both linear operations that can be implemented by a single layer with no activation function. Both worked fine.

So then I wanted to see if I could add an abs() and make it learn an amplitude spectrum. First I thought about how many nodes it would need in the hidden layer, and realized that 3 ReLUs are sufficient for a crude approximation of abs(x+jy) = sqrt(x² + y²), so I tested that operation by itself on lone complex numbers (2 inputs → 3 ReLU nodes hidden layer → 1 output). Occasionally it works:

3 ReLUs implementing Euclidean distance as inverted hexagonal pyramid

But most of the times that I try it, it gets stuck in a local minimum and fails to find the right shape:

3 ReLUs forming valley-shaped network

loss vs epochs

I've tried all the optimizers and ReLU variants in Keras, but they don't make much difference. Is there something else I can do to make simple networks like this converge reliably? Or am I just approaching this with the wrong attitude, and you're supposed to just throw way more nodes than necessary at the problem and if half of them die it's not considered a big deal?

Best Answer

The output seems to strongly suggest that one or more of your neurons goes dead (or perhaps the hyperplane of weights for two of your neurons have merged). You can see that with 3 Relu's, you get 3 shadowy splits in the center when you converge to the more reasonable solution. You can easily verify if this is true by checking the output values of each neuron to see if it stays dead for a large majority of your samples. Alternatively, you could plot all 2x3=6 neuron weights, grouped by their respective neuron, to see if two neurons collapse to the same pair of weights.

I suspect that one possible cause of this is when $x+iy$ is skewed toward one coordinate, e.g. $x\gg y$, in which case you're trying to reproduce the identity, as then $abs(x+iy)\approx x$. There's really not much you can do here to remedy this. One option is to add more neurons as you've tried. The second option is to try a continuous activation, like a sigmoid, or perhaps something unbounded like an exponential. You could also try dropout (with say, 10% probability). You could use the regular dropout implementation in keras, which is hopefully smart enough to ignore situations when all 3 of your neurons drop out.