Suppose $G$ is a simple, undirected graph. The corresponding adjacency matrix, $A$ is binary, symmetric, and has all zeroes on the diagonal.
By definition (Networks: An Introduction, M.E.J Newman), the clustering coefficient $C$ is given by
$$C=\frac{6 \times \text{Number of Triangles}}{\text{Number of Paths of Length 2}} $$
This should be equivalent to
$$C = \frac{tr(A^3)}{\sum_{i,j}(A^2)_{i,j}} $$
which follows from the fact that the number of paths of length $n$ between $i$ and $j$ is given by $(A^n)_{i,j}$ (so the number of triangles is $\frac{1}{6} tr(A^3)$ because there are 6 loops of length 3 in one triangle). But is this correct? The reason I have a doubt is because I can't get my formula to match with the R package igraph.
library(igraph)
# Calculate igraph's transitivity
G <- erdos.renyi.game(20, .5, "gnp")
igraphTrans <- transitivity(G)
# Calculate transitivity from my formula
A <- as_adj(G)
A2 <- crossprod(A) # A^2
numerator <- sum(diag(crossprod(A2, A))) # trace(A^3)
denominator <- sum(A2)
myTrans <- numerator/denominator
#
# Giving
print(igraphTrans)
[1] 0.5165336
print(myTrans)
[1] 0.4655704
What's the issue?
Thanks!
Best Answer
Original Poster here, I found the answer to the question.
The problem is "transitivity" wants to say "if $i$ is connected to $v$ and $v$ is connected to $w$, then what is the probability $i$ is connected to $w$?"
So, where the denominator says "paths of length 2", it should say "paths of length 2 that are not loops". We are not interested in paths like $i \rightarrow j \rightarrow i$; that defeats the purpose of transitivity.
So, we get:
$$ C = \frac{tr(A^3)}{\sum_{i\ne j} (A^2)_{i,j }}$$
The key difference with my wrong formula above (in the question) is that we need to exclude the diagonal by specifying $i\ne j$