Calculating the average of all pairwise distances between individuals

average

Assume a population of four individuals T1, T2, T3, and T4. The Euclidean distance is calculated for each pair of the four individuals. I would like to calculate the average of all pairwise distances between all individuals. I found one paper calculating the average as:

$$\frac{\sum_{i=0}^{\mid P \mid }\sum_{j=0,j\neq i}^{\mid P \mid } dist(x_i,x_j)}{\mid P \mid(\mid P \mid – 1)}$$

where P is the number of individuals (4 in the example shown above).

I can't understand why the total distance is divided by $\mid P \mid(\mid P \mid – 1)$. I was expecting the distance to be divided by the number of combinations (6 in the example shown above).

Is there a reason why all pairwise distances between all individuals is divided by that?

Best Answer

Notice that $\sum_{i=0}^{P}\sum_{j=0,j\neq i}^{P}dist(x_{i},x_{j})$ calculate distance between any pair of individuals twice?

Hint: $dist(x_{i},x_{j})=dist(x_{j},x_{i})$

You are correct that the sum of all distance should be divided by the combination, but since our formula calculate each distance twice, we need to divide by $2$, thus the average distance is

$avg=\frac{\sum_{i=0}^{P}\sum_{j=0,j\neq i}^{P}dist(x_{i},x_{j})}{2}\frac{1}{P\choose2}$

$avg= \frac{\sum_{i=0}^{P}\sum_{j=0,j\neq i}^{P}dist(x_{i},x_{j})}{2\frac{P(P-1)}{2}}$

$avg= \frac{\sum_{i=0}^{P}\sum_{j=0,j\neq i}^{P}dist(x_{i},x_{j})}{P(P-1)}$