Solved – Why center the data during feature scaling for neural network

machine learningstandardization

Basically, feature scaling is done to transform the data to same scale so that the gradients don't have bias towards larger values.

Now, we do this by centering the data and dividing by standard deviation –

x = np.array([-20,-10,20,50])
y = (x - x.mean())/x.std()
print(y)

I get the scaled values as –

[-1.09544512 -0.73029674  0.36514837  1.46059349]

But, if we only want to scale the values, why subtract by the mean, we can only divide by standard deviation i.e

x = np.array([-20,-10,20,50])
y = (x)/x.std()
print(y)

And I get the scaled values as –

[-0.73029674 -0.36514837  0.73029674  1.82574186]

What purpose does this centering the data serve?

Best Answer

We subtract the mean, because what we aim for is not only scaling, but rather normalisation of the data, so that it is also centred around its mean.

I think a good intuition can be obtained about why we need centering by considering batch normalisation and sigmoid activation functions.

enter image description here

If you look at the sigmoid activation function, notice that the point where you will get the largest gradients is the middle of the sigmoid function where it most closely approximates a linear function. If you put a lot of very large of very small values, you saturate the activation function and you get basically something which approximates a flat line. This will results in effectively a slower convergence of the algorithm because the optimisation has to find out the appropriate scaling of all parameters, so the gradients are not saturated and learning can be performed faster.

In essence, centering the data will cause only the outliers to saturate, which could be considered desirable because we often want outliers to have relatively less importance.

Related Question