[Math] Expected global clustering coefficient for Erdős–Rényi graph

graph theoryprobabilityprobability distributionsrandom-graphs

What is the expected global clustering coefficient $\mathbb{E}[C_{GC}]$ for the Erdős–Rényi random graph (ER-graph) $\mathcal{G}(n,p)$ (expectation is over the ensemble of all ER-graphs) as $n \rightarrow \infty$ and $p$ fixed?

The global clustering coefficient $C_{GC}$ is defined as

$C_{GC}={\frac {3\times {\mbox{number of triangles}}}{{\mbox{number of connected triplets of vertices}}}}={\frac {{\mbox{number of closed triplets}}}{{\mbox{number of connected triplets of vertices}}}}$.

A connected triplet is defined to be a connected subgraph consisting of three vertices and two edges. A closed triplet is a connected triplet that induces a triangle.

While it is easy to see that the expected mean local clustering coefficient is $p$ (see next section), the expected global clustering coefficient is not identically $p$ for any $n$.

For example, for $n=3$, $C_{GC} = 1$ only when all edges are present (with probability $p^3$) and is otherwise zero (with probability $1-p^3$).
Hence the $\mathbb{E}[E_{GC}] = p^3$ when $n=3$.

Computationally, I have found that $\mathbb{E}[C_{GC}]\approx p$ for large $n$.

Is there a way to prove that $\mathbb{E}[C_{GC}]= p$ as $n\rightarrow\infty$?

My current theory is to use Chebyshev's inequality on this, but I haven't tried it out yet.

Expected local clustering coefficient = p

In contrast, it is easy to see that the expected local clustering coefficient $\mathbb{E}[C_i]$ for any node $i$ is $p$.

The local clustering coefficient $C_i$ of node $i$ (for an undirected network) is defined as

$C_i = \frac{\text{number of triangles that contain $i$}}{k_i (k_i-1)/2}$,
where $k_i$ is the degree of $i$.

In other words, it is the proportion of links between the vertices within its neighbourhood divided by the number of links that could possibly exist between them.

We have $\mathbb{E}[C_i]=p$ in an ER-graph because the probability for an edge between any neighbours of the node is $p$, independent for any other edge. (For an alternative answer involving more algebra, see here).

Hence the expected mean local clustering coefficient $\mathbb{E}[\sum_i C_i]$ is $p$ for any $n$.

Best Answer

If there are $3 \binom n3 p^3$ triangles in expectation, and $3 \binom n3 p^2$ connected triples, the global clustering coefficient should approach their ratio (which is $p$).

Of course, naively taking their ratios doesn't work: $\mathbb E[\frac {X}Y]$ is not the same thing as $\frac{\mathbb E[X]}{\mathbb E[Y]}$. This is one of the main challenges in dealing with the expected value of a ratio. Instead, we'll show that both quantities are concentrated around their mean, and proceed that way.

Let $X$ denote the number of triangles in $\mathcal G(n,p)$. It's easy to see if we properly define triangles that $\mathbb E[X] = 3\binom n3 p^3$, which for consistency with connected triplets I want to define as $3\binom n3$ choices of a potential path $P_3$, and a $p^3$ chance that both edges of the path and the edge that makes it a triangle are present.

Moreover, the number of triangles is $3n$-Lipschitz in the edges of the graph (changing one edge changes the number of triangles by at most $3n$) so by McDiarmid's inequality $$ \Pr[|X - \mathbb E[X]| \ge n^{2.5}] \le 2 \exp \left(-\frac{2n^5}{\binom n2 (3n)^2}\right) \le 2 e^{-4n/9}. $$ If we let $Y$ be the number of connected triplets, then the expected number of them is $3\binom n3 p^2$: there are $3\binom n3$ potential copies of $P_3$ in the graph, and each is realized with a probability of $p^2$. This is actually $2n$-Lipschitz in the edges of the graph (each edge is part of at most $2n$ paths on $3$ vertices) but we'll round that up to $3n$-Lipschitz so that the same bound applies.

As a result, with probability $1 - 4 e^{-4n/9}$, both values are within $n^{2.5}$ of their expected value, and so the ratio $\frac{X}{Y}$ satisfies $$ \frac{3\binom n3 p^3 - n^{2.5}}{3\binom n3 p^2 + n^{2.5}} \le \frac{X}{Y} \le \frac{3\binom n3 p^3 + n^{2.5}}{3\binom n3 p^2 - n^{2.5}} $$ and therefore $\frac XY = p + O(n^{-0.5})$ in these cases. The remaining cases occur with probability $4e^{-4n/9}$, and the ratio is between $0$ and $1$ even then, so they can affect the expected value by at most $4e^{-4n/9}$ as well. Therefore $\mathbb E[\frac XY] = p + O(n^{-0.5})$, which approaches $p$ as $n \to \infty$.

Related Question