Solved – To train a neural network, do I normalize all input data at once, or row by row individually

neural networksnormalization

I want to train a neural network on a classification task, and I understood that normalizing the data helps the network to converge faster. Let's assume I normalize my data via

norm_data = (data – mean)/standard_deviation

My question is: Do I compute the mean and standard deviation of the whole input data set, or do I do this separately for every row? I found examples for both methods, and now I wonder which one is better.

Is there a better one, is it case-dependent, or is this not important at all?
In my specific data, all input values can theoretically appear in approximately the same range.

Best Answer

Normalize on the the whole input data set is safer, since the normalization applies consistently on every row.

It would also be case-dependent, since you have found examples of normalizing separately for every row; however, if the range of instances features vary a lot, we can imagine that instances with widely different original features may turn very similar after normalizing separately on every row.

For example, (3, 3, 3, -3, -3, -3) and (1, 1, 1, -1, -1, -1) may be very different instances and have different expected labels. But the normalization results 'for each row' are always the same. The following R script shows this:

> a = c(3,3,3,-3,-3,-3);
> b = c(1,1,1,-1,-1,-1);

> a / sd(a)
[1]  0.9128709  0.9128709  0.9128709 -0.9128709 -0.9128709
[6] -0.9128709
> b / sd(b)
[1]  0.9128709  0.9128709  0.9128709 -0.9128709 -0.9128709
[6] -0.9128709

The means of features values are both 0. So after 'normalization separately for every row', instances a's and b's prediction results will be always the same for all functions. This would be sometimes or usually undesirable.

Related Question