Correlation – Relationship Among Cosine Similarity, Pearson Correlation, and Z-Score

correlationcosine similarityz-score

I'm wondering if there is any relationship among these 3 measures. I can't seem to make a connection among them by referring to the definitions (possibly because I am new to these definitions and am having a bit of a rough time grasping them).

I know the range of the cosine similarity can be from 0 – 1, and that the pearson correlation can range from -1 to 1, and I'm not sure on the range of the z-score.

I don't know, however, how a certain value of cosine similarity could tell you anything about the pearson correlation or the z-score, and vice versa?

Best Answer

The cosine similarity between two vectors $a$ and $b$ is just the angle between them $$\cos\theta = \frac{a\cdot b}{\lVert{a}\rVert \, \lVert{b}\rVert}$$ In many applications that use cosine similarity, the vectors are non-negative (e.g. a term frequency vector for a document), and in this case the cosine similarity will also be non-negative.

For a vector $x$ the "$z$-score" vector would typically be defined as $$z=\frac{x-\bar{x}}{s_x}$$ where $\bar{x}=\frac{1}{n}\sum_ix_i$ and $s_x^2=\overline{(x-\bar{x})^2}$ are the mean and standard deviation of $x$. So $z$ has mean 0 and standard deviation 1, i.e. $z_x$ is the standardized version of $x$.

For two vectors $x$ and $y$, their correlation coefficient would be $$\rho_{x,y}=\overline{(z_xz_y)}$$

Now if the vector $a$ has zero mean, then its variance will be $s_a^2=\frac{1}{n}\lVert{a}\rVert^2$, so its unit vector and z-score will be related by $$\hat{a}=\frac{a}{\lVert{a}\rVert}=\frac{z_a}{\sqrt n}$$

So if the vectors $a$ and $b$ are centered (i.e. have zero means), then their cosine similarity will be the same as their correlation coefficient.

TL;DR Cosine similarity is a dot product of unit vectors. Pearson correlation is cosine similarity between centered vectors. The "Z-score transform" of a vector is the centered vector scaled to a norm of $\sqrt n$.