I am running neural networks with a sigmoid activation function, and have output data which already lies within [0,1]. However, the minimum value of the data is around 0.05, and the maximum is about 0.2.
My question is this: Is there any benefit from using min-max normalization to "stretch" the data over the full range of the sigmoid function's possible output range? Or would it be best if I simply left it as it is. The closest thing I've found to an answer to my question in the literature is here:
"the dependent variable does not have to be converted when modeling a binary response variable because its values already fall within this range" (Olden & Jackson, 2002)"
However, the obvious difference is that the output values of a classification problem clearly extend from 0 to 1, while I am wondering if the very tight range of my output variables will produce sub-optimal results.
Best Answer
You should try it both ways and see for yourself, as it'll largely depend on the dataset.
However "stretching" the data can be somewhat helpful in this case from an numerical point of view. Your neural network is basically outputting some real-valued $u \in \mathbb{R}$, then you apply a sigmoid:
Since your data ranges from [0.05, 0.2], the last hidden layer should range between approximately [-1.28, -0.60]. If you scaled the data to range [0, 1], then the hidden layer is just anything in (-$\infty$, $\infty$). When it's unscaled, the hidden layer has to be that much more numerically precise/stable compared to the scaled version. If you have a very deep network or complicated architecture, or the data is very noisy, then I suspect there might be some differences. Or maybe not.