Normalization vs Scaling – Key Differences and Uses

data transformationnormality-assumptionnormalizationscales

What is the difference between data 'Normalization' and data 'Scaling'? Till now I thought both terms refers to same process but now I realize there is something more that I don't know/understand. Also if there is a difference between Normalization and Scaling, when should we use Normalization but not Scaling and vice versa?

Please elaborate with some example.

Best Answer

I am not aware of an "official" definition and even if there it is, you shouldn't trust it as you will see it being used inconsistently in practice.

This being said, scaling in statistics usually means a linear transformation of the form $f(x) = ax+b$.

Normalizing can either mean applying a transformation so that you transformed data is roughly normally distributed, but it can also simply mean putting different variables on a common scale. Standardizing, which means subtracting the mean and dividing by the standard deviation, is an example of the later usage. As you may see it's also an example of scaling. An example for the first would be taking the log for lognormal distributed data.

But what you should take away is that when you read it you should look for a more precise description of what the author did. Sometimes you can get it from the context.