Solved – Is cosine similarity identical to l2-normalized euclidean distance

Identical meaning, that it will produce identical results for a similarity ranking between a vector u and a set of vectors V.

I have a vector space model which has distance measure (euclidean distance, cosine similarity) and normalization technique (none, l1, l2) as parameters. From my understanding, the results from the settings [cosine, none] should be identical or at least really really similar to [euclidean, l2], but they aren't.

There actually is a good chance the system is still buggy — or do I have something critical wrong about vectors?

edit: I forgot to mention that the vectors are based on word counts from documents in a corpus. Given a query document (which I also transform in a word count vector), I want to find the document from my corpus which is most similar to it.

Just calculating their euclidean distance is a straight forward measure, but in the kind of task I work at, the cosine similarity is often preferred as a similarity indicator, because vectors that only differ in length are still considered equal. The document with the smallest distance/cosine similarity is considered the most similar.

Best Answer

For $\ell^2$-normalized vectors $\mathbf{x}, \mathbf{y}$, $$||\mathbf{x}||_2 = ||\mathbf{y}||_2 = 1,$$ we have that the squared Euclidean distance is proportional to the cosine distance, \begin{align} ||\mathbf{x} - \mathbf{y}||_2^2 &= (\mathbf{x} - \mathbf{y})^\top (\mathbf{x} - \mathbf{y}) \\ &= \mathbf{x}^\top \mathbf{x} - 2 \mathbf{x}^\top \mathbf{y} + \mathbf{y}^\top \mathbf{y} \\ &= 2 - 2\mathbf{x}^\top \mathbf{y} \\ &= 2 - 2 \cos\angle(\mathbf{x}, \mathbf{y}) \end{align} That is, even if you normalized your data and your algorithm was invariant to scaling of the distances, you would still expect differences because of the squaring.

Best Answer

Related Solutions

Solved – K-means on cosine similarities vs. Euclidean distance (LSA)

Solved – TF-IDF versus Cosine Similarity in Document Search

Related Question