Solved – Calculating Transitivity (Clustering Coefficient) from Adjacency Matrix, and igraph package

graph theoryigraphnetworkssocial network

Suppose $G$ is a simple, undirected graph. The corresponding adjacency matrix, $A$ is binary, symmetric, and has all zeroes on the diagonal.

By definition (Networks: An Introduction, M.E.J Newman), the clustering coefficient $C$ is given by

$$C=\frac{6 \times \text{Number of Triangles}}{\text{Number of Paths of Length 2}} $$

This should be equivalent to

$$C = \frac{tr(A^3)}{\sum_{i,j}(A^2)_{i,j}} $$

which follows from the fact that the number of paths of length $n$ between $i$ and $j$ is given by $(A^n)_{i,j}$ (so the number of triangles is $\frac{1}{6} tr(A^3)$ because there are 6 loops of length 3 in one triangle). But is this correct? The reason I have a doubt is because I can't get my formula to match with the R package igraph.

library(igraph)
# Calculate igraph's transitivity
G <- erdos.renyi.game(20, .5, "gnp")
igraphTrans <- transitivity(G)
# Calculate transitivity from my formula
 A <- as_adj(G)
A2 <- crossprod(A) # A^2
numerator <- sum(diag(crossprod(A2, A))) # trace(A^3)
denominator <- sum(A2) 
myTrans <- numerator/denominator

#
# Giving
 print(igraphTrans)
 [1] 0.5165336
  print(myTrans)
 [1] 0.4655704

What's the issue?

Thanks!

Best Answer

Original Poster here, I found the answer to the question.

The problem is "transitivity" wants to say "if $i$ is connected to $v$ and $v$ is connected to $w$, then what is the probability $i$ is connected to $w$?"

So, where the denominator says "paths of length 2", it should say "paths of length 2 that are not loops". We are not interested in paths like $i \rightarrow j \rightarrow i$; that defeats the purpose of transitivity.

So, we get:

$$ C = \frac{tr(A^3)}{\sum_{i\ne j} (A^2)_{i,j }}$$

The key difference with my wrong formula above (in the question) is that we need to exclude the diagonal by specifying $i\ne j$

Related Question