Solved – How should I normalise the inputs to a neural network

neural networksnormalization

My neural network can have all sorts of inputs from different datasets. For example, with digit recognition using the MNIST dataset, there are 784 inputs (each pixel 28×28) and each value is between 0-255 (single grayscale). However, this would produce math range errors with the sigmoid function because too larger negative values would be produced on later layers. So, all value are divided by 255 to get decimal values between 0-1. Is that correct?

Well, does this mean with other datasets, that have big values, you can just divide them all by 10 or 100 to make them smaller? Does that matter? What is this process known as?

Best Answer

This process is known as "normalization" or "transformation" and is part of your feature engineering. Your problem applies to all machine learning algorithms, not just neural networks.

We usually prefer [0,1], because they are easier to deal with. There's no rule on what and how to normalize, but you should think:

  • Are my variables on a comparable scale?
  • Does my machine learning require normalization?
  • Is my variable discrete, should I transform it to continuous?

You certainly shouldn't just divide your variables randomly to make it smaller. In your image classification, dividing by 255 is good because the whole range is in [0,1]. You can't have anything less than 0 and greater than 1.