Law of large numbers for the number of connected components in a random graph

convergence-divergencelaw-of-large-numbersprobabilityprobability-limit-theoremsrandom-graphs

A network evolves similarly to the Preferential Attachment model, with some
important modification. The network starts at time $t = 1$ with one isolated vertex. At any step
$t ≥ 2$, a new vertex, $v_t$, arrives and connects to precisely one of the already existing vertices with probability $1/t$, while it connects to itself (forming a loop) also with probability $1/t$.

Let $N_t$ denote the number of connected components in this evolving graph at time $t$. To show: there exists a deterministic sequence $a_t$ such that $N_t/a_t \xrightarrow{\mathbb{P}} 1$ as $t \to \infty$. Here, $t = 1, 2, \dotsc$.

My attempt so far: I was thinking of computing $\mathbb{E}[N_t]$ and show that for any $\varepsilon > 0$,
$$
\lim_{t \to \infty}\mathbb{P}(|N_t – \mathbb{E}[N_t]| \geq \varepsilon) = 0.
$$

First I of course need to determine what $\mathbb{E}[N_t]$ is. Let $G_t$ be a graph at step $t$ and $n_t = |V(G_t)|$. From the given information, I know that for $i, j \in [n_t]$ and $i \neq j$,
$$
\mathbb{P}(i \longrightarrow j | G_t) = \frac{1}{t} = \mathbb{P}(i \longrightarrow i | G_t).
$$

I have that
$$
\mathbb{E}[N_{t+1}] = \mathbb{E}[\mathbb{E}[N_{t+1}|G_t]] = \mathbb{E}[\mathbb{E}[N_{t+1} – N_t + N_t|G_t]] = \mathbb{E}[N_t] + \mathbb{E}[\mathbb{E}[N_{t+1} – N_t|G_t]].
$$

I can only think of one possibility that $N_{t+1} – N_t$ is non zero, given $G_t$. This is because if a vertex at step $t + 1$ connects to any vertex from $G_t$, then the number of connected components does not increase, meaning that $N_{t+1} = N_t$. However, if that vertex at step $t + 1$ forms a self-loop, then the number of connected component increases by one with probability $1/(t+1)$. Therefore,
$$
\mathbb{E}[N_{t+1}] = \mathbb{E}[N_{t}] + \frac{1}{t+1}.
$$

This is a recurrence relation that I can solve. Let $\mathbb{E}[N_1] = 1$, which is reasonable because at $t = 1$, there is only one isolated point and so there is only one connected component. Then,
$$
\mathbb{E}[N_2] = 1 + \frac{1}{2}; \ \mathbb{E}[N_3] = 1 + \frac{1}{2} + \frac{1}{3}; \ \mathbb{E}[N_4] = 1 + \frac{1}{2} + \frac{1}{3} + \frac{1}{4}; \cdots; \ \mathbb{E}[N_t] = \sum_{k=1}^t\frac{1}{k}.
$$

Now I need to prove the convergence in probability. By using the Chebychev inequality,
$$
\mathbb{P}(|N_t – \mathbb{E}[N_t]| \geq \varepsilon) \leq \frac{\operatorname{Var}(N_t)}{\varepsilon^2} = \frac{\mathbb{E}[N_t^2] – \mathbb{E}[N_t]^2}{\varepsilon^2}
$$

Here is where I get stuck: I have no idea what the second moment of $N_t$ is. There is hint that
$$
\log(t+1) \leq \sum_{k=1}^t\frac{1}{k} \leq 1 + \log t
$$

But with this I can bound $\mathbb{E}[N_t]^2$ from above. Other thing that I have tried to write the probability as the following:
$$
\mathbb{P}(|N_t – \mathbb{E}[N_t]| \geq \varepsilon) = \mathbb{P}(N_t – \mathbb{E}[N_t] \leq-\varepsilon) + \mathbb{P}(N_t – \mathbb{E}[N_t] \geq \varepsilon)
$$

Then, by Markov's inequality and the given hint,
$$
\mathbb{P}(N_t – \mathbb{E}[N_t] \geq \varepsilon) \leq \frac{\mathbb{E}[N_t]}{\varepsilon + \mathbb{E}[N_t]} \leq \frac{1 + \log t}{\varepsilon + \log(t+1)^{-1}}.
$$

This won't go to zero and I am not sure how to bound $\mathbb{P}(N_t – \mathbb{E}[N_t] \leq-\varepsilon)$ as well. What should I do?

Best Answer

Before I address the question, let me comment that the inequality $|N_t - \mathbb E[N_t]| \xrightarrow{\mathbb P} 0$ you've tried for is much stronger than the inequality $N_t/\mathbb E[N_t] \xrightarrow{\mathbb P} 1$. The second inequality (which is the one we want here) merely wants $$ \lim_{t \to \infty} \Pr\Big[\big|N_t - \mathbb E[N_t]\big| \ge \varepsilon \mathbb E[N_t]\Big] \to 0. $$ We can still do this using Chebyshev's inequality, but now the error term it will give us is $\frac{\text{Var}[N_t]}{\varepsilon^2 \mathbb E[N_t]^2}$.


The key is the observation that $N_t$ is equal to $N_{t-1}+1$ if the $t^{\text{th}}$ vertex decides to have a loop, and to $N_{t-1}$ otherwise. A more useful way of stating this is to define a Bernoulli random variable $$L_t = \begin{cases}1 & \text{ if there is a loop at $t$,} \\ 0 & \text{otherwise.}\end{cases}$$ Then we have $N_t = N_{t-1} + L_t$, which actually lets us rewrite $N_t$ as a sum: $$N_t = L_1 + L_2 + \cdots + L_t.$$ Importantly, $L_1, \dots, L_t$ are independent (a vertex does not look at the choices made by previous vertices) and so we can compute moments of $N_t$ from the moments of $L_t$.

Just as we have $$\mathbb E[N_t] = \mathbb E[L_1] + \mathbb E[L_2] + \dots + \mathbb E[L_t] = 1 + \frac12 + \cdots + \frac1t$$ we can also write $$\text{Var}[N_t] = \text{Var}[L_1] + \text{Var}[L_2] + \dots + \text{Var}[L_t] = 1 \cdot 0 + \frac12 \cdot \frac12 + \cdots + \frac1t \cdot \frac{t-1}{t}.$$ (This relies on independence - variance would not otherwise be additive.)

This is an obnoxious sum that does not have a nice closed form, but we don't need it to. All we need to know is that:

  • $\text{Var}[N_t] \le \mathbb E[N_t]$, because $\frac1i \cdot \frac{i-1}{i} \le \frac1i$ for $i=1, \dots, t$. (It would be enough to know that $\text{Var}[N_t] \ll \mathbb E[N_t]^2$.)
  • As $t \to \infty$, $\mathbb E[N_t] \to \infty$.

This is enough to conclude that we get the convergence in probability we want. Using Chebyshev's inequality, $$ \Pr\Big[\big|N_t - \mathbb E[N_t]\big| \ge \varepsilon \mathbb E[N_t]\Big] \le \frac{\text{Var}[N_t]}{\varepsilon^2 \mathbb E[N_t]^2} \le \frac1{\varepsilon^2 \mathbb E[N_t]} $$ which converges to $0$ as $t\to\infty$.

By the way, the random process in this question is known as the Chinese restaurant process, and the distribution of $N_t$ is the distribution of the number of cycles in a uniformly random permutation of $\{1,\dots,t\}$.

Related Question