I'm doing experiments on random graph denoted as $G(n, p)$, which is Erdos-Renyi graph. So far, there's certainly a transition phase. If $p$ > $T$, where $T > \frac{1}{n}$ is the threshold, then the graph will have only a single connected component.
Is there an analytical solution for small fixed $p$? Could we derive or predict the asymptotic growth, or determine lower/upper bounds? Is there a connection between $p$ and number of connected components $C_p$?
Since current implementation provides a random generator with different seed, a multiple attempts were made to see the overall tendency.
edges = np.arange(5, 200, 1)
probability = 0.1
for attempts in ['r', 'g', 'b', 'y', 'm']:
components = []
for edges_cnt in edges:
G = nx.gnp_random_graph(edges_cnt, probability)
components.append(nx.number_connected_components(G))
plt.plot(edges, components, lw = 1, color = attempts, label = 'p = 0.002')
When $p$ is small enough, one can observe the linear (probably $O(n)$) growth. As we increase it, the process becomes more stochastic and it becomes impossible for me to derive the conclusions.
Thank you in advance!
Best Answer
The number of components is still determined by the relationship between $n$ and $p$. In $G_{n,p}$ where we consider $n$ to be the primary parameter and $p$ to be a function of $n$, there are several primary cutoffs. For large $n$, we expect that
If $p < \frac1n$, then there are few cycles, and so the number of components is roughly $n - \binom n2p$. (It starts at $n$, and in the absence of cycles, each of an average of $\binom n2p$ edges reduces the number of components by $1$.) As we get closer and closer to $\frac1n$, this approximation gets worse, but the number of components should still be linear in $n$.
If $p > \frac1n$, there is a giant component. Estimates here are more annoying to get, but the idea is that for each $k \ll \log n$, the number of components of size $k$ is linear in $n$, and you can estimate the number of them by analyzing a branching process.
Once $p$ is sufficiently larger than $\frac1n$ (when $np \to \infty$ as $n \to \infty$), the most frequent small components are isolated vertices, and the average number of these is $n(1-p)^{n-1} \sim n e^{-np}$.
For $p$ around $\frac{\log n}{n}$, even the isolated vertices disappear or almost disappear.
If you fix $p$ and vary $n$, the cutoffs appear in the same places. But the problem is that all of the usual approximations for random graphs are valid for large $n$. For large $p$ like $p=0.1$, the cutoffs like $p = \frac1n \iff n = \frac1p$ are not very large, and so the approximations are not very good. But for large $p$, we expect the number of components to shrink down to $1$ very quickly.
Once you're comparing $p=0.01$ or $p=0.001$ or $p=0.00001$, the descriptions above are going to be correct, except to see where they occur, you should solve for $n$ in terms of $p$.