Why normalize the vectors to calculate the Pearson correlation coefficient

correlationpearson-rr-squaredregression

I learned from this answer that the correlation $R$ is $\cos(\theta)$ and $\theta$ is the angle between a dependent vector $Y$ and an independent vector $X$, but I learned from this article that the two vectors are normalized (by subtracting their corresponding means) ones.

I believe the normalized ones are correct for two reasons: 1) the unnormalized version doesn't make the correlation invariant to scaling and shift; 2) the results of the normalized version is exactly the correlation formula:

$$R=\frac{1}{n-1}\sum_{i=1}^n\frac{x_i-\bar x}{s_x}\frac{y_i-\bar y}{s_y}$$

But why should the two vectors be normalized?

Best Answer

The practical difference between the centered (normalised) and uncentered version is, that for the Pearson product-moment correlation coefficient it is possible to construct a hypothesis test with the null hypothesis:

H0: rho = 0
Ha: rho !=0

Note, that the uncentered version does exist and is called Tucker's congruence coefficient (despite having first been published by Cyril Burt in 1948). Moreover the geometrical meaning of the Pearson and the Tucker coefficients is the same.


References

Burt, C. (1948). The factorial study of temperamental traits. British Journal of Mathematical and Statistical Psychology, 1(3), 178–203.

Tucker, L. R (1951). A method for synthesis of factor analysis studies (No. PRS-984). Prince- ton: Educational Testing Service.