Solved – Euclidean distance score and similarity

distance-functionssimilarities

I'm just working with the book Collective Intelligence (by Toby Segaran) and came across the Euclidean distance score. In the book the author shows how to calculate the similarity between two recommendation arrays (i.e. $\textrm{person} \times \textrm{movie} \mapsto \textrm{score})$ .

He calculates the Euclidean distance for two persons $p_1$ and $p_2$ by
$$d(p_1, p_2) = \sqrt{\sum_{i~\in~\textrm{item}} (s_{p_1} – s_{p_2})^2} $$

This makes completely sense to me. What I don't really understand is why he calculates at the end the following to get a "distance based similarity":

$$ \frac{1}{1 + d(p_1, p_2)} $$

So, I somehow get that this must be the conversion from a distance to a similarity (right?). But why does the formular looks like this? Can someone explain that?

Best Answer

The inverse is to change from distance to similarity.

The 1 in the denominator is to make it so that the maximum value is 1 (if the distance is 0).

The square root - I am not sure. If distance is usually larger than 1, the root will make large distances less important; if distance is less than 1, it will make large distances more important.