[Math] Average path length: can’t figure out the meaning of $\large\frac{1}{n(n – 1)}$

graph theoryNetwork

I am doing a few tests with the Average Path Length formula. I am testing with a directed acyclic graph. I am generating all the shortest path using a breath first walk.

Everything works fine. However, I am trying to relate what I did with the Average Path Length formula I am seeing everywhere: wikipedia, math insight, etc.

All the definitions says:
$$
\bar P = \frac{1}{n \cdot (n \mathbin{-} 1)} \cdot \sum_{i \neq j}^{} d(v_{i},v_{j})
$$

All of them says that $n$ is the number of nodes in $G$.

The problem is that I am about 20 000 nodes in my graph and about 600 000 shortest paths. If I do:

$$
\bar P = \frac{1}{S} \cdot \sum_{i \neq j}^{} d(v_{i},v_{j})
$$

where $S$ is the number of shortest paths (600 000) then I get an average of 6.05 which is what I am expecting after looking at the distribution of my shortest path on an histogram.

However, if I do what I think that the formula is telling me, then I endup with doing:

$$
\bar P = \frac{1}{400000000} \cdot \sum_{i \neq j}^{} d(v_{i},v_{j})
$$

which is clearly not the average of my shortest paths.

So my question is: what is my issue with understanding this formula? I am clearly missing something in my thinking, but I don't know what.

Best Answer

Label all the vertices $V = \{v_1, \dots, v_n\}$. This formula is supposed to represent the average of all the shortest paths in the graph. Hence, given a node $v_1$, the average length of shortest paths emanating from $v_1$ is $$ \frac{1}{n-1}\sum_{i\neq 1}d(v_1, v_i), $$ since there are $n-1$ vertices distinct from $v_1$. Likewise, the average of shortest paths emanating from $v_k$ is $$ \frac{1}{n-1}\sum_{i\neq k}d(v_k, v_i). $$ So, if we wish to find the average of all shortest paths, we take the average of these: $$ \frac{1}{n}\left(\frac{1}{n-1}\sum_{i\neq 1}d(v_1, v_i) + \frac{1}{n-1}\sum_{i\neq 2}d(v_2, v_i) + \cdots + \frac{1}{n-1}\sum_{i\neq n}d(v_n, v_i)\right) $$ $$ = $$ $$ \frac{1}{n(n-1)}\sum_{i\neq j}d(v_i, v_j). $$

Related Question