[Math] Cosine similarity / distance and triangle equation

geometrymetric-spacesreference-requestvector-spaces

There is a similarity function particular popular for processing sparse vectors such as textual data (word frequency counts etc.) commonly referred to as cosine similarity.

There are two variants to invert it to a dissimilarity, often referred to as cosine and arccos distance (distance in the weak sense, not the strict mathematical definition though!)

In essence, the similarity function is:
$$\text{cosine-similarity}(A,B) = \frac{\left<A,B\right>}{||A||\cdot||B||}$$
Which is then used as a distance function as either
$$\text{cosine-dist}(A,B) := 1 – \text{cosine-similarity}(A,B)$$
$$\text{arccos-dist}(A,B) := \arccos(\text{cosine-similarity}(A,B))$$

Obviously, these distances cannot be a distance function on $\mathbb{R}^n$, as they are not well defined for the point $\{0\}^n$, as this leads to $0/0$. What is the proper result then? $1$? $\infty$?

I tried finding a formal proof on Google that these distances do or do not satisfy the triangle inequality. Wikipedia seems to claim only the second is a proper metric, but does not give a reference.

Update: reworked my question from here on, with updated thoughts on this issue.

As confirmed by joriki, the $0$ is a problem for this distance function, as one cannot compute the angle to this vector. There is another issue with this distance, that however in many circumstances is intentional: two vectors that are a positive linear multiple of each other will have the angle of 0, while not being the same. See his reply on why cosine-dist does not satisfy the triangle equality for small angles (I wonder if this issue is comparable to that of $L_p$ with $p<1$).

I have the following ideas in my mind, and again appreciate any pointers to literature, references, errors in these thoughts, extensions:

A) Instead of $\mathbb{R}^n$, lets look at the unit sphere instead, i.e. vectors of length $1=||A||=||B||$. $\arccos(\left<A,B\right>)$ then is the geodesic distance on the unit sphere, which is metric, right? So in this restricted domain, arccos-dist is a proper distance?

B) Assuming I have an injective (not necessarily surjective) map from another domain to the unit sphere, then use this distance function, this becomes also a metric space? After all, any of the distance function properties should still hold, right?

C) Is arccos-dist a pseudo-metric on $\mathbb{R}^n \setminus \{0\}$? (i.e. I accept that $d(x,y) = 0 \not\Rightarrow x=y$, only $d(x,x)=0$)

Best Answer

Neither of these is a metric on $\mathbb R^n$ for several reasons, some of which you've pointed out. In a sense, the main reason, of which the undefined result for the zero vector is a symptom, is that this value depends only on the direction of the vectors and not on their length (and the zero vector has no direction).

If you want to consider these functions as metrics, you need to consider them on the set of directions (i.e. rays from the origin) in $\mathbb R^n$, or equivalently on the unit sphere $S^{n-1}$ in $\mathbb R^n$.

The second function, the "arccos distance", is just the angle between the two directions/vectors, and this is a metric because it's the geodesic distance on the unit sphere.

The first function, the "cosine distance", isn't a metric because for small angles it approximately calculates (half) the square of the angle, $1-\cos \alpha\approx\frac12\alpha^2$, and if you turn by the same angle $\alpha$ twice, the sum of the two individual "distances" will be approximately $\frac12\alpha^2+\frac12\alpha^2=\alpha^2$ whereas the "distance" between the directions $2\alpha$ apart will be approximately $\frac12(2\alpha)^2=2\alpha^2$.

To answer your questions:

  1. I don't know a source, but I hope the above arguments should be immediate enough to convince without a source.

  2. No, since you'd still get a distance of $0$ for any vectors in the same direction, so this could be at most a pseudometric.

  3. Never heard of those.

  4. There is none; the function will necessarily be discontinuous if you extend it to $0$, and the value at $0$ will necessarily be arbitrary because there are vectors in all directions arbitrarily close to $0$.

Related Question