[Math] Scale cosine similarity between vectors to range 0, 1

anglevector analysisvector-spaces

I am interested in calculating similarity between vectors, however this similarity has to be a number between 0 and 1. There are many questions concerning tf-idf and cosine similarity, all indicating that the value lies between 0 and 1. From Wikipedia:

In the case of information retrieval, the cosine similarity of two
documents will range from 0 to 1, since the term frequencies (using
tf–idf weights) cannot be negative. The angle between two term
frequency vectors cannot be greater than 90°.

The peculiarity is that I wish to calculate the similarity between two vectors from two different word2vec models. These models have been aligned, though, so they should in fact represent their words in the same vector space. I can calculate the similarity between a word in model_a and a word in model_b like so

import gensim as gs
from sklearn.metrics.pairwise import cosine_similarity

model_a = gs.models.KeyedVectors.load_word2vec_format(model_a_path, binary=False)
model_b = gs.models.KeyedVectors.load_word2vec_format(model_b_path, binary=False)

vector_a = model_a[word_a].reshape(1, -1)
vector_b = model_b[word_b].reshape(1, -1)

sim = cosine_similarity(vector_a, vector_b).item(0)

But sim is then a similarity metric in the [-1,1] range. Is there a scientifically sound way to map this to the [0,1] range? Intuitively I would think that something like

norm_sim = (sim + 1) / 2

is okay, but I'm not sure whether that is good practice with respect to the actual, mathematical meaning of cosine similarity. If not, are other similarity metrics advised?

Best Answer

Please check out the wiki page: cosine_similarity_wiki, there it discusses how to convert the cosine similarity to angular distance.

In numpy:

import numpy as np
angular_dis = np.arccos(cos_sim) / np.pi

You can also see it as the answer with 0 votes on the post: stackoverflow_post.

Related Question