Variance using indicator variables

combinatoricsexpected valuegraph theoryprobabilityvariance

A graph $G=(V, E)$ has $2n$ vertices of which $n$ are coloured in blue and $n$ are coloured in red. The probability that there is an edge between any two vertices is $1/2$. Let $Y$ denote the expected number of edges whose endpoints have the same colour. Compute the variance of $Y$.

I should also compute the variance of the number of edges whose endpoints have the same colour, whereby I will denote this random variable as $Y$ now (in the above it was $X$). Here is my approach:

\begin{align*}
Y_{u, v}:= \begin{cases}
1, \text{ the edge } \{u, v\} \text{ exists} \land \left(
u, v \text{ are blue} \lor u, v\text{ are red}
\right)
\\[5pt]
0, \ \text{else}
\end{cases}
.\end{align*}
So far I know that $\mathbb{E}[Y_{u, v}]=$
We will use the identity
$$
\text{Var}\left[X \right] =\mathbb{E}\left[X^2 \right]-
\mathbb{E}\left[X \right]^2.
$$
\begin{align*}
\mathbb{E}\left[Y^2 \right]
= \mathbb{E}\left[
\sum_{\{u, v\} \subseteq V}^{} \sum_{\{x, y\} \subseteq V}^{}
Y_{\{u, v\} }\cdot Y_{\{x, y\} }
\right]
=\sum_{\{u, v\} \subseteq V}^{} \sum_{\{x, y\} \subseteq V}^{}
\mathbb{E}\left[Y_{\{u, v\} }\cdot Y_{\{x, y\} }\right]
.\end{align*}

Now we do a case distinction:
\begin{align*}
&1.\quad \{u, v\} =\{x, y\}\colon
\mathbb{E}\left[Y_{\{u, v\} }\cdot Y_{\{x, y\} }\right]
=
\mathbb{E}\left[Y_{\{u, v\} }^2\right]=
\mathbb{E}\left[Y_{\{u, v\} }\right]=?.
\\[10pt]
&2. \quad \{u, v\} \cap \{x, y\} =1\colon ?
\\[10pt]
&3. \quad \{u, v\} \cap \{x, y\} = \varnothing \colon
\mathbb{E}\left[Y_{\{u, v\} }\cdot Y_{\{x, y\} }\right]
=\mathbb{E}\left[ Y_{\{u, v\} } \right]\cdot \mathbb{E}\left[Y_{\{x, y\} }
\right]=?.
\end{align*}

I was told as a hint to define a suitable indicator variable and then use the above method to split it up, however, I'm stuck at the calculation. First, is my choice of indicator variable correct? If yes, how do I compute $\mathbb{E}\left[Y_{\{u, v\} }\right]$ and deal with the second case?

Edit: I'm not allowed to make use of the binomial distribution.

Best Answer

Since there are $n(n-1) = 2 \binom n2$ potential edges between two vertices of the same color, and each one exists independently with probability $\frac12$, $Y \sim \text{Binomial}(n(n-1), \frac12)$, and you can just use the formula for the variance of a binomial.

As for your choice of indicator variables: it will lead you to the correct answer if you do everything correctly, but it makes everything too difficult. You have $n^2$ indicator variables that are identically $0$: the $Y_{\{u,v\}}$ where $u$ is red and $v$ is blue. With the indicator variable approach, it is better to define $V = R \cup B$ as the partition of the vertex set into colors and take $$ Y = \sum_{\{u,v\} \subseteq R} Y_{\{u,v\}} + \sum_{\{u,v\} \subseteq B} Y_{\{u,v\}}. $$ All these indicator variables are independent and have expected value $\frac12$, so if you are not allowed to use the binomial distribution, you will end up re-deriving its variance.

Related Solutions

Coin Tosses and Variance 3 Runs of Heads

In the following we use $n$ instead of the longer string $10$. (Number of people.) The indices $j,k$ will be considered modulo $n$. (So $j\pm1$ is also considered after applying $\pm1$ modulo $n$.) The following works for any $n\ge 6$.

Let $X_k$ be the random variable on $\{0,1\}^n$ which is $1$ if the components $k-1,k,k+1$ are all heads, else $0$.

The computation of $\Bbb E X_k = \frac 1{2^3}= \frac 18$ is ok, so $$\Bbb E X =\Bbb E\sum_k X_k =\sum_k \Bbb E X_k = \sum_k \frac 18 = \frac n8\ .$$

Now we compute explicitly for some fixed $k$: $$ \begin{aligned} \Bbb E X_k^2 &=\frac 1{2^3}\ ,\text{ positions $k-1,k,k+1$ are head,}\\ \Bbb E X_kX_{k\pm 1} &=\frac 1{2^4}\ ,\text{ positions $k-1,k,k+1$ and also $k\pm2$ are head,}\\ \Bbb E X_kX_{k\pm 2} &=\frac 1{2^5}\ ,\text{ positions $k-1,k,k+1$ and also $k\pm2,k\pm 3$ are head,}\\ \Bbb E X_kX_j &=\frac 1{2^6}\ ,\text{ positions $k-1,k,k+1$ and also $j-1,j,j+1$ are head,} \end{aligned} $$ the index $j$ being not among the neighbors of distance $\le 2$ to $k$. So $$ \begin{aligned} \Bbb EX^2 &= \Bbb E \sum_{k,j}X_kX_j\\ &= \sum_k\sum_j\Bbb E X_kX_j\\ &=\sum_k\left( \frac 1{2^3} +\frac 1{2^4}+\frac 1{2^4} +\frac 1{2^5}+\frac 1{2^5} +(n-5)\frac 1{2^6} \right) \\ &= \sum_k\frac 1{2^6}(8+4+4+2+2+(n-5)) = \frac {n(n+15)}{64}\ . \end{aligned} $$ So the variation of $X$ is $$ \sigma^2:= \operatorname{Var}[X] = E[X^2]-E[X]^2 = \frac {n(n+15)}{64} - \left(\frac n8\right)^2 = \frac {15n}{64} \ . $$ So the standard deviation $\sigma$ is the square root of this number, a specific constant times $\sqrt n$.

So we apply the inequality of Cebîshev: $$ \Bbb{P}(\ |X-\Bbb{E}(X)| \geq c \sqrt{n}\ ) = \Bbb{P}\left(\ |X-\Bbb{E}(X)| \geq c \cdot\frac 8{\sqrt {15}}\sigma\ \right) \le \left(\frac {\sqrt{15}}{8c}\right)^2 =\frac {15}{64c} \ . $$

For my safe i wanted to verify the above, the following rather simple sage code confirms the results:

for n in [6..12]:

    R = [0, 1]
    C = cartesian_product( [ R for _ in range(n) ] )
    p = 1/2^n    # weight of each element in the probability space C

    M1 = 0
    M2 = 0

    for c in C:
        count = len( [ k for k in range(n)
                       if  c[k]       == 1
                       and c[(k-1)%n] == 1
                       and c[(k+1)%n] == 1 ] )
        M1 += p * count
        M2 += p * count^2

    V  = M2 - M1^2

    print "n = %s" % n
    print "\t1. st moment = %s" % M1
    print "\t2. nd moment = %s" % M2
    print "\tVariation    = %s" % V

Results:

n = 6
        1. st moment = 3/4
        2. nd moment = 63/32
        Variation    = 45/32
n = 7
        1. st moment = 7/8
        2. nd moment = 77/32
        Variation    = 105/64
n = 8
        1. st moment = 1
        2. nd moment = 23/8
        Variation    = 15/8
n = 9
        1. st moment = 9/8
        2. nd moment = 27/8
        Variation    = 135/64
n = 10
        1. st moment = 5/4
        2. nd moment = 125/32
        Variation    = 75/32
n = 11
        1. st moment = 11/8
        2. nd moment = 143/32
        Variation    = 165/64
n = 12
        1. st moment = 3/2
        2. nd moment = 81/16
        Variation    = 45/16

Computing expected value of dependent random variables

Your answer is not correct.

I will not immediately point out the flaw in your reasoning, in case you prefer to find the flaw by yourself. One way to see that your answer is wrong is by doing the following sanity check: is this the right answer for small values of $n$?

For $n = 1$, there are two vertices with different colours, with an edge between them with probability $\frac{1}{2}$. However, there are no edges between vertices of the same colour, so the answer should be $0$. But your formula gives $\frac{1}{4} \cdot \binom{2}{2} = \frac{1}{4}$.

Maybe it helps if you work out the answer for $n = 2$, and then see how you can generalize this solution. If you figure it out, feel free to answer the question yourself. Otherwise, let us know if you need more help!

Added later (as requested). The flaw in your reasoning is this. In your solution, the equality $\mathbb{E}[Y]=\sum_{\{u, v\} \subseteq V}^{} \mathbb{E}\left[ Y_{\{u, v\} } \right]$ is valid. However, we don't have $\mathbb{E}\left[ Y_{\{u,v\}} \right] = \frac{1}{8}$, but rather $$ \mathbb{E}\left[ Y_{\{u,v\}} \right] = \begin{cases} \frac{1}{2},&\text{if $u$ and $v$ are blue};\\[1ex] 0,&\text{otherwise}. \end{cases} $$ After all, the expected value ranges over all edge realizations, but the colours are fixed. So if either $u$ or $v$ is not blue, then we have $Y_{\{u,v\}} = 0$ for all possible edge realizations.

Best Answer

Related Solutions

Coin Tosses and Variance 3 Runs of Heads

Computing expected value of dependent random variables

Related Question