Solved – Normalizing continuous features using sigmoid function

feature-engineeringmachine learningneural networksnormalizationsigmoid-curve

Can you use the sigmoid function to normalize continuous features that have no theoretical maximum value but tend to cluster around [-1, 1]?

Although using the sigmoid function would be a non-linear normalization, my intuition is that the deep neural network or machine learning model that I am training would learn that the continuous feature is not linearly normalized, therefore adapting and doing fine. Is this correct? Can the sigmoid function or any other non-linear normalization method (ex. tanh) be used for continuous features?

Best Answer

The description you give is basically what a sigmoid feed-forward neural network does in its hidden layers: find $a,b$ so that $\sigma(x|a,b)$ minimizes some loss, where $\sigma$ is any sigmoid function, for example you could choose $\sigma(x|a,b)=\tanh(ax+b)$. Depending on the choice of $a,b$, the function could be basically constant at a large value, basically constant at a small value, or approximately linear, or some kind of mix of all three.

This is presented in terms of scalar-valued functions, but NNs with more than one unit uses matrix-vector products.