Solved – Training neural network on skewed dataset: output always 1

backpropagationmachine learningneural networkspythonregression

I'm learning neural networks, and wrote a network from scratch using numpy and pandas. I'm training it using stochastic gradient descent to predict house prices. The dataset is right-skewed, I mean-normalized variables other than the target, the predictor variables are of types binary and numerical. The activation function is sigmoid.

The problem I'm facing is that the output on test data is around 1 for all observations, the actual price has to be the order of 100k.
I tried normalizing the target, price, in training set, then the output on test data was close to zero for all inputs.

This is how I initialized the weights and biases(sizes is a list containing number of neurons in each layer)

self.biases = [np.random.normal(0.0,b**-.5,(b,1)) for b in self.sizes[1:]]
self.weights = [np.random.normal(0.0,y**.5,(x,y)) for x,y in zip(self.sizes[1:],self.sizes[:-1])]

What should I do in order to get reasonable outputs.

Histogram of selling price of houses

Best Answer

The sigmoid function, pictured below, squeezes all input to fit between $0$ and $1$, as you probably know. Even a super-bad fit should probably produce some outputs above $1$ which made me suspect that you had passed your final prediction to a sigmoid function by mistake. The sigmoid function is only appreciably different from $0$ or $1$ maybe in the range $x \in [-5, 5]$.

enter image description here