Solved – What’s the difference between Normalization and Standardization

descriptive statisticsnormalizationstandardization

At work we were discussing this as my boss has never heard of normalization. In Linear Algebra, Normalization seems to refer to the dividing of a vector by its length. And in statistics, Standardization seems to refer to the subtraction of a mean then dividing by its SD. But they seem interchangeable with other possibilities as well.

When creating some kind of universal score, that makes up $2$ different metrics, which have different means and different SD's, would you Normalize, Standardize, or something else? One person told me it's just a matter of taking each metric and dividing them by their SD, individually. Then summing the two. And that will result in a universal score that can be used to judge both metrics.

For instance, say you had the number of people who take the subway to work (in NYC) and the number of people who drove to work (in NYC).

$$\text{Train} \longrightarrow x$$
$$\text{Car} \longrightarrow y$$

If you wanted to create a universal score to quickly report traffic fluctuations, you can't just add $\text{mean}(x)$ and $\text{mean}(y)$ because there will be a LOT more people who ride the train. There's 8 million people living in NYC, plus tourists. That's millions of people taking the train everyday verse hundreds of thousands of people in cars. So they need to be transformed to a similar scale in order to be compared.

If $\text{mean}(x) = 8,000,000$

and $\text{mean}(y) = 800,000$

Would you normalize $x$ & $y$ then sum? Would you standardize $x$ & $y$ then sum? Or would you divide each by their respective SD then sum? In order to get to a number that when fluctuates, represents total traffic fluctuations.

Any article or chapters of books for reference would be much appreciated. THANKS!

Also here's another example of what I'm trying to do.

Imagine you're a college dean, and you're discussing admission requirements. You may want students with at least a certain GPA and a certain test score. It'd be nice if they were both on the same scale because then you could just add the two together and say, "anyone with at least a 7.0 can get admitted." That way, if a prospective student has a 4.0 GPA, they could get as low as a 3.0 test score and still get admitted. Inversely, if someone had a 3.0 GPA, they could still get admitted with a 4.0 test score.

But it's not like that. The ACT is on a 36 point scale and most GPA's are on 4.0 (some are 4.3, yes annoying). Since I can't just add an ACT and GPA to get some kind of universal score, how can I transform them so they can be added, thus creating a universal admission score. And then as a Dean, I could just automatically accept anyone with a score above a certain threshold. Or even automatically accept everyone whose score is within the top 95%…. those sorts of things.

Would that be normalization? standardization? or just dividing each by their SD then summing?

Best Answer

Normalization rescales the values into a range of [0,1]. This might be useful in some cases where all parameters need to have the same positive scale. However, the outliers from the data set are lost.

$$ X_{changed} = \frac{X - X_{min}}{X_{max}-X_{min}} $$

Standardization rescales data to have a mean ($\mu$) of 0 and standard deviation ($\sigma$) of 1 (unit variance).

$$ X_{changed} = \frac{X - \mu}{\sigma} $$

For most applications standardization is recommended.

Related Question