Solved – When I normalize the standardized values of data, why do I get the same values as just normalizing

data transformationnormalizationstandardization

Please consider the following example:

enter image description here

In the second column, I standardize VAR1, subtracting the mean and dividing by the standard deviation. In the third column, I normalize VAR1 with the formula:

$$\frac{x – \min(x)}{\max(x)-\min(x)}$$

In the fourth column I normalize the standardized values of VAR1 (using the above mentioned formula, on column 2).

Can you tell me why columns 3 and 4 are the same?

Best Answer

The easy answer is that standardization and normalization are linear transformations of the data and any line is determined by two distinct points on it. Since columns 3 and 4 are constructed to include the points $(\min(\text{data}), 0)$ and $(\max(\text{data}), 1)$ (namely, $(2,0)$ and $(95, 1)$), they must result from identical transformations.

The algebraically rigorous answer manipulates the formulas. Recall that standardization of data $\mathbf{x} = x_1, x_2, \ldots, x_n$ replaces each $x_i$ with

$$y_i = \frac{x_i - \bar x}{s_x}$$

where $\bar x$ usually is the (arithmetic) mean of the data and $s_x$ often is a standard deviation of the data. (Generally, $\bar x$ may be any estimate of a central value and $s_x$ may be any estimate of the spread, but such generality is usually not intended.)

On the other hand, normalization replaces each $x_i$ with

$$z_i = \frac{x_i - \min(\mathbf{x})}{\max(\mathbf x) - \min(\mathbf x)}.$$

But since $s_x \gt 0$, division by $s_x$ is an increasing function. Subtracting any constant like $\bar x$ similarly is an increasing function. Therefore the extremes of $\mathbf x$ correspond to the extremes of $\mathbf y$. Consequently, normalization of $\mathbf y = y_1, y_2, \ldots,y_n$ produces

$$\eqalign{ z_i^\prime &= \frac{y_i - \min(\mathbf{y})}{\max(\mathbf y) - \min(\mathbf y)} \\ &= \frac{\frac{x_i - \bar x}{s_x} - \min_j(\frac{x_j - \bar x}{s_x})}{\max_j(\frac{x_j - \bar x}{s_x}) - \min_j(\frac{x_j - \bar x}{s_x})} \\ &= \frac{{x_i - \bar x} - \min_j({x_j - \bar x})}{\max_j({x_j - \bar x}) - \min_j({x_j - \bar x})} \\ &= \frac{x_i - \min(\mathbf{x})}{\max(\mathbf x) - \min(\mathbf x)} \\ &= z_i. }$$