Solved – Normalization – Deep Learning

deep learningmachine learningmeannormalizationvariance

I understand how to normalize a matrix/vector so as to get it to smaller scale and speed up further tasks. I am curious to know why the zero-mean matrix is divided by the variance and how it helps.
Edit: I am unable to understand the intuition behind dividing the matrix by the variance

Best Answer

Usually the vector is divided by the standard deviation (square root of the variance) rather than the variance. The reason is to put things on a similar scale and also has the effect of removing units. Imagine if your data measures height, it could be measured in millimeters, centimeters, inches, feet, meters, kilometers, miles, or other units. If you divide by the standard deviation then you end up with something unitless (does not matter which unit was used originally) and if your data is distributed in a roughly symmetric mound shape then about two thirds of your values will be between -1 and 1, about 95% (most) will be between -2 and 2 with almost all (99%) between -3 and 3. Even if the distribution is very non-symmetric non-mound shape you should still have at least 75% between -2 and 2 and 90% or more between -3 and 3. So things are on a nice scale without too many values really close to 0 (measuring the heights of people in miles) or too far from 0 (measuring heights of mountains in millimeters).