I've implemented a neural network and I'm training it to compute Xor. 1 out of x times it fails to learn, where x is about 5 or 10. It then gives e.g. 0.67 instead of 0 as output for input (1,1). Is this just some unlucky randomization of the initial weights and should I move on to my real problem instance, or should I solve this first? What could be the cause?
Some more background info:
I'm using f(x) = 1/(1+exp(-x)) as activation function for both hidden neurons and output neuron. The hidden and output neuron have a bias. All weights are initially random numbers between 0 and 1. I'm using the backpropagation algorithm as described here: https://en.wikipedia.org/wiki/Backpropagation
I varied the learning factor from 0.001 to 1 and I did up to 1,000,000 training iterations.
Best Answer
Yes.
There are 16 local minimums that have the highest conversion if the weights are initialized between 0.5 and 1.
Image source: Yoshio Hirose, Koichi Yamashita, Shimpei Hijiya, "Back-propagation algorithm which varies the number of hidden units," Neural Networks, Volume 4, Issue 1, (1991)