Solved – Data normalization and standardization in neural networks

machine learningneural networksnormalizationstandardization

I am trying to predict the outcome of a complex system using neural networks (ANN's). The outcome (dependent) values range between 0 and 10,000. The different input variables have different ranges. All the variables have roughly normal distributions.

I consider different options to scale the data before training. One option is to scale the input (independent) and output (dependent) variables to [0, 1] by computing cumulative distribution function using the mean and standard deviation values of each variable, independently. The problem with this method is that if I use the sigmoid activation function at the output, I will very likely miss extreme data, especially those not seen in the training set

Another option is to use a z-score. In that case I don't have the extreme data problem; however, I'm limited to a linear activation function at the output.

What are other accepted normalization techniques that are in use with ANN's? I tried to look for reviews on this topic, but failed to find anything useful.

Best Answer

A standard approach is to scale the inputs to have mean 0 and a variance of 1. Also linear decorrelation/whitening/pca helps a lot.

If you are interested in the tricks of the trade, I can recommend LeCun's efficient backprop paper.