Solved – distance measure of angles between two vectors, taking magnitude into account

clusteringdistance

Suppose I have two vectors, v1 and v2, from which I can calculate the angle between these two vectors as a measure of their "distance", using the arccos function, say. For example:

v1 = c(100,200, 500,600)    
v2 = c( 50, 30,  10,5)      
v3 = c( 10,  7,  30,40) 

# pairwise angles
acos( as.numeric((v1 %*% v2) / (norm_vec(v1) * norm_vec(v2))) ) * 180 / pi  # 66.8017
acos( as.numeric((v2 %*% v3) / (norm_vec(v2) * norm_vec(v3))) ) * 180 / pi  # 66.67337
acos( as.numeric((v1 %*% v3) / (norm_vec(v1) * norm_vec(v3))) ) * 180 / pi  # 8.061138

This kind of measure will give similar distances (angles) regardless of the magnitude of the elements in the vectors. For instance, the distance between v1 and v2 is 66.80, and the distance between v2 and v3 is similarly 66.67, but clearly the magnitude of v2 and v3 are more similar than v1, so I am thinking of a measure that will also take the "magnitude" into account when calculating the dissimilarity. In other words, dist(v1, v2) should be greater than dist(v2,v3), but this result is obtained by still using the pairwise angle idea. Thanks!

UPDATE

Thank you all for your replies! For the equivalence of Euclidean and angle distances, I use the same vectors as above to calculate the Euclidean distances:

# Euclidean distance
ed <- function(x1,x2)
  sqrt(sum((x1 - x2) ^ 2))
ed(v1,v2)  
# 790.9014
ed(v2,v3)  
# 61.26989  --> imply v2, v3 are "closer"
ed(v1,v3)  
# 761.4782  --> imply v1, v3 are "far apart"

As you can see, v1, v2 and v1, v3 have similar magnitude of distances in Euclidean case, but for arccosine, they are different (66.8 vs. 8.06). Is there anything particular that is revealed by the angle distance but not the Euclidean distance? I think the orientation information is emphasized in the angle distance.

Best Answer

There is a close relationship between Euclidean distance and Angular distance.
(see also: http://en.wikipedia.org/wiki/Cosine_similarity#Properties)

So if you want to take magnitude into account, you may actually be looking for Euclidean distance...

Let's look at squared Euclidean, for simplicity: $$ \sum_i (a_i - b_i)^2 = \sum_i \left(a_i^2 - 2 a_i b_i + b_i\right)\\ = \sum_i a_i^2 + \sum_i b_i^2 - 2 \sum_i a_i b_i = ||A|| + ||B|| - 2 (A \cdot B) $$

Now let's assume $||A||=||B||=1$, i.e. vectors standardized to unit length.
Then Euclidean distance is $\sqrt{2 - 2 (A\cdot B)}$!

In essence, Cosine similarity is like (squared) Euclidean distance after scaling each vector to unit length. I.e. if you have data that is very different in magnitude, but you do not want to take magnitude into account, then use cosine.

Note that angular similarity is scale invariant: $$ \operatorname{CosSim}\left(\frac{A}{||A||}, \frac{B}{||B||}\right) = \operatorname{CosSim}\left(A, B\right) $$ And if I didn't screw up somewhere (please edit then!): $$ \operatorname{Euclidean}\left(A,B\right)^2 = ||A||^2 + ||B||^2 - 2 ||A|| ||B|| \operatorname{CosSim}\left(A, B\right) $$

(Law of Cosines).

Conversely, normalizing your data before using Euclidean yields: $$ \operatorname{Euclidean}\left(\frac{A}{||A||}, \frac{B}{||B||}\right)^2 = 2\left[1 - \operatorname{CosSim}\left(A, B\right)\right] $$ (Note that the right hand is a popular way of converting cosine similarity to a distance!)
And cosine similarity is monotone to squared Euclidean on the normalized vectors: $$ \operatorname{CosSim}\left(A, B\right) = 1-\frac{1}{2}\operatorname{Euclidean}\left(\frac{A}{||A||}, \frac{B}{||B||}\right)^2 $$

As per @whuber's comment: the formula $$ \operatorname{acos}\left[\operatorname{CosSim}\left(A, B\right)\right] $$ yields the geodetic distance on the surface of the unit sphere to get from the point $A/||A||$ to the point $B/||B||$. For obvious reasons, the $0$ vector yields undefined results.