If there are $3 \binom n3 p^3$ triangles in expectation, and $3 \binom n3 p^2$ connected triples, the global clustering coefficient should approach their ratio (which is $p$).
Of course, naively taking their ratios doesn't work: $\mathbb E[\frac {X}Y]$ is not the same thing as $\frac{\mathbb E[X]}{\mathbb E[Y]}$. This is one of the main challenges in dealing with the expected value of a ratio. Instead, we'll show that both quantities are concentrated around their mean, and proceed that way.
Let $X$ denote the number of triangles in $\mathcal G(n,p)$. It's easy to see if we properly define triangles that $\mathbb E[X] = 3\binom n3 p^3$, which for consistency with connected triplets I want to define as $3\binom n3$ choices of a potential path $P_3$, and a $p^3$ chance that both edges of the path and the edge that makes it a triangle are present.
Moreover, the number of triangles is $3n$-Lipschitz in the edges of the graph (changing one edge changes the number of triangles by at most $3n$) so by McDiarmid's inequality
$$
\Pr[|X - \mathbb E[X]| \ge n^{2.5}] \le 2 \exp \left(-\frac{2n^5}{\binom n2 (3n)^2}\right) \le 2 e^{-4n/9}.
$$
If we let $Y$ be the number of connected triplets, then the expected number of them is $3\binom n3 p^2$: there are $3\binom n3$ potential copies of $P_3$ in the graph, and each is realized with a probability of $p^2$. This is actually $2n$-Lipschitz in the edges of the graph (each edge is part of at most $2n$ paths on $3$ vertices) but we'll round that up to $3n$-Lipschitz so that the same bound applies.
As a result, with probability $1 - 4 e^{-4n/9}$, both values are within $n^{2.5}$ of their expected value, and so the ratio $\frac{X}{Y}$ satisfies
$$
\frac{3\binom n3 p^3 - n^{2.5}}{3\binom n3 p^2 + n^{2.5}} \le \frac{X}{Y} \le \frac{3\binom n3 p^3 + n^{2.5}}{3\binom n3 p^2 - n^{2.5}}
$$
and therefore $\frac XY = p + O(n^{-0.5})$ in these cases. The remaining cases occur with probability $4e^{-4n/9}$, and the ratio is between $0$ and $1$ even then, so they can affect the expected value by at most $4e^{-4n/9}$ as well. Therefore $\mathbb E[\frac XY] = p + O(n^{-0.5})$, which approaches $p$ as $n \to \infty$.
With the same matrix calculations that gave you closed triangles, we can find open triplets.
The $(i,j)$ entry of $A^2$ counts the number of paths of length $2$ from $i$ to $j$. Then, computing $\mathbf 1^{\mathsf T}\!A^2 \mathbf 1$ (where $\mathbf 1$ is the all-ones vector) will add up all these values, telling you the total number of paths of length $2$ in your graph. I think Matlab can also do this with sum(A^2,'all')
or sum(sum(A^2))
, but I'm not a Matlab user so I can't be sure.
A set of three vertices $\{i,j,k\}$ will contribute:
- $0$ to this total if the graph has $0$ or $1$ of the edges between them, since then there are no paths of length $2$ involving $i,j,k$.
- $2$ to this total if the graph has $2$ of the edges between them, since then there are two paths (one in either direction).
- $6$ to this total if the graph has all $3$ of the edges between them, since then any permutation of $\{i,j,k\}$ is a path.
Therefore $\mathbf 1^{\mathsf T}\!A^2 \mathbf 1$ gives you $2$ times the number of open triplets plus $6$ times the number of closed triplets. You already have the closed triplets from $\operatorname{tr}(A^3)$, so this lets you solve for the open ones.
(Double-check that I'm getting the terminology right - I always forget how symmetry is counted for clustering coefficients - but the idea should be sound either way.)
Best Answer
One very concrete way to define the global clustering coefficient of a graph with vertex set $V$ and edge set $E$ is $$ C_{GC} = \frac{\text{# of ordered triples $(x,y,z) \in V^3$ such that $xy, yz, xz \in E$}}{\text{# of ordered triples $(x,y,z) \in V^3$ such that $xy, yz \in E$}}. $$ Another, equivalent definition (the one you're referring to) is $$ C_{GC} = \frac{\text{# of edge pairs $\{xy, yz\} \subset E$, with $x\ne z$, such that $xz \in E$}}{\text{# of edge pairs $\{xy, yz\} \subset E$, with $x\ne z$}}. $$ In the second definition, both the numerator and denominator will be exactly $\frac12$ of what they are in the first definition (since an edge pair $\{xy,yz\}$ can correspond to both the ordered triple $(x,y,z)$ and the ordered triple $(z,y,x)$). This doesn't change their ratio, so we get the same value of $C_{GC}$ either way.
A "connected triplet" is the denominator of the second definition: a pair of edges $xy, yz$ sharing one endpoint $y$. You could equivalently define a connected triplet as a subgraph isomorphic to $P_3$ (a path on $3$ vertices).
The numerator of the second definition is $3$ times the number of triangles if we define a triangle as a subgraph isomorphic to $C_3$ (a cycle on $3$ vertices), or a set of three vertices $\{x,y,z\}$ with all three edges $xy, yz, xz$ present. We have to multiply by $3$ because we can think of each such triangle in three ways:
As a result, the numerator of the second definition counts each triangle three times.