Solved – Weighted Cosine Similarity

cosine similarity

To convert cosine similarity to weighted cosine similarity, one can use at least two approaches. But I don't know which one is better.

The first approach is to first reweight each vector and then calculate a normal cosine similarity. In other words:

$$Weighted-Cosine1(u, v; w) = cosine(w u,w v) \\=
\frac{\sum_{i}{(w_i u_i)(w_i v_i)}}{\sqrt{\sum_{i}(w_i u_i)^2}\sqrt{\sum_{i}(w_i v_i)^2}}\\=
\frac{\sum_{i}{w_i^2 u_i v_i}}{\sqrt{\sum_{i}(w_i u_i)^2}\sqrt{\sum_{i}(w_i v_i)^2}}$$

The other approach is to calculate each element of cosine similarity without considering weights and then calculate a weighted sum of them. In other words:
$$Weighted-Cosine2(u, v; w) =\sum_{i}w_i\frac{u_i v_i}{\sqrt{\sum_{i}u_i^2}\sqrt{\sum_{i}v_i^2}}$$

Assuming that $$||u||_1=||v||_1=1$$, what are the pros and cons of each of them?

Best Answer

scipy.spatial.distance.cosine has implemented weighted cosine similarity as follows (source):

$$\frac{\sum_{i}{w_i u_i v_i}}{\sqrt{\sum_{i}w_i u_i^2}\sqrt{\sum_{i}w_i v_i^2}}$$

I know this doesn't actually answer this question, but since scipy has implemented like this, may be this is better than both of your approaches.

Also you can see that by using square root of your weights in your second approach will give scipy method. This will differ when you use your weights to sum up to 1

Related Question