Solved – Cosine similarity

binary datacosine similaritysimilarities

Let's say I have two vectors of $1$ and $-1$, and I want to know how similar these two vectors are. Is the use of the cosine similarity coefficient, justifed in this case?

Best Answer

Let $x, y\in\{-1,+1\}^k$. Then their cosine similarity is

$$ \cos\theta =\frac{x\cdot y}{\|x\|_2\|y\|_2}=\frac{x\cdot y}{k} $$

since

$$ \|x\|_2=\|y\|_2=\sqrt{k}. $$

And

$$ x\cdot y = \#\{i\,|\,x_i=y_i\}-\#\{i\,|\,x_i\neq y_i\}$$

simply counts the number of concordant minus the number of discordant pairs. So your cosine similarity is simply this number scaled by $k$ to $[-1,+1]$.

I'd say this kind of similarity makes perfect sense.