Solved – Neural network always returns same result

neural networksregression

I'm writing my own implementation of a neural network to test my knowledge, and while it seems to run okay, it converges such that the output is always the mean value (0.5 since I'm using logistic output activation) regardless of the input, and nothing I do seems to change anything. There are a few similar questions on, but I've tried the suggestions there to no avail.

So far I've tried the network on an XOR gate, and on the function f(a + b) = c for a, b in the range [0, 0.5) using a squared difference loss function. Both of these networks have two inputs, one output, and I've tried a number of architectures, including 1 and 2 hidden layers with about 3-8 nodes in each. All of the neurons are using a logistic activation. Stochastic, mini-batch, with bias/without bias, all seemed to make no difference; same for the learning rate, for which I've done a number of values between 0 and 1. The only thing I can think of it that it's the initialisation (I'm currently initialising weights according to a uniform distribution with small min/max centred about 0) but wouldn't know how to fix that even if it were wrong.

I have looked at the deltas I'm getting for the weights and they seem sensible. After training for a while the weights look fairly sensible too, most between -1 and 1 though there are sometimes a few that are a lot larger. Training error starts off relatively high, then comes down within a few epochs and stays there, sometimes even increases slightly. Any help would be greatly appreciated.

Best Answer

[An extended comment, rather than an "answer", which I think would be difficult from the information provided]

I tried to quickly replicate the scenario you're looking at with mxnet. The result was accuracy > 98%, with a decent spread in output. Histogram of logistic output vs target variable below. I also tried targeting RMSE and still get good results. Apologies if I misunderstood your description of the test case data!

library(mxnet)

x <- data.frame( a= runif(500), b= runif(500))
y <- as.numeric((x$a <0.5) * (x$b < 0.5))
x <- data.matrix(x, rownames.force = NA)

model <- mx.mlp(x, y, hidden_node=5, out_node=2, out_activation="softmax",
                num.round=100, array.batch.size=15, learning.rate=0.07, momentum=0.9,
                eval.metric=mx.metric.accuracy) # mx.metric.rmse, mx.metric.accuracy)

Histogram of predicted

Update: I played more and found training the network setup below to gets stuck at something that predicts near constant (generate data above with set.seed(0) too). Thus perhaps there is some phenomenon happening here that someone more experienced can comment on.

set.seed(0)
model <- mx.mlp(x, y, hidden_node=c(3,3), out_node=2, out_activation="softmax", activation="tanh",
                num.round=50, array.batch.size=15, learning.rate=0.07, momentum=0.9,
                eval.metric=mx.metric.accuracy) # mx.metric.rmse, mx.metric.accuracy)

With initializer = mx.init.uniform(1) (i.e. uniform[-1,1] values this seems to go away). Generally with 2 layers there seem too many degrees of freedom.