Variance of sample mean of correlated random variables is zero

correlationcovarianceexpected valuerandom variablesstatistics

Suppose we have identically distributed random variables with pairwise correlation $\rho$ and variance $\sigma^2$ (assume mean is zero). Then, defining
$
\bar{X} = \frac{1}{n} \sum_{i=1}^n X_i,
$

we have:
\begin{equation}
var(\bar{X}) = \mathbb{E} [\bar{X}^2] – \mathbb{E}[\bar{X}]^2
\end{equation}

\begin{align}
\mathbb{E} [\bar{X}^2] &= \frac{1}{n^2} \mathbb{E} [\sum_i X_i^2 + \sum_i \sum_{j \neq i} X_i X_i] \\
&= \frac{1}{n^2} \{ \sum_i \mathbb{E} X_i^2 + \sum_i \sum_{j \neq i} \mathbb{E}[X_i X_j] \} \\
&= \frac{1}{n^2} \{ \sum_i \sigma^2 + \sum_i \sum_{j \neq i} \rho \sigma^2 \} \\
&= \frac{1}{n^2} \{ n \sigma^2 + n (n-1) \rho \sigma^2 \} \\
&= \frac{\sigma^2}{n} + \frac{n-1}{n} \rho \sigma^2
\end{align}

and as the mean is zero, $\mathbb{E} [\bar{X}]^2 = 0$.

So,
\begin{equation}
var(\bar{X}) = \frac{\sigma^2}{n} + \frac{n-1}{n} \rho \sigma^2
\end{equation}

Now, suppose we have a fixed $n$. Then, if we generate identically distributed random variables with pairwise correlation $\rho = \frac{-1}{n-1}$, then
\begin{equation}
var(\bar{X}) = \frac{\sigma^2}{n} + \frac{n-1}{n} \sigma^2 (\frac{-1}{n-1}) = 0
\end{equation}

This seems super weird. I must be going wrong somewhere, but I don't know where! Would you please be able to have a look?

Thanks in advance!

Best Answer

There is nothing wrong with your work. It is just that you have an extreme (and so slightly weird) case. Note that the pairwise correlations cannot be less than $-\frac1{n-1}$ if you want the matrix to be positive semidefinite and even then there will be restrictions on the distribution of the $X_i$ to achieve this. The extreme effect is to make $\bar X=\mu$ $(=0$ here$)$ almost surely.

For example when $n=2$ this gives $\rho=-1$ and since $X_1$ and $X_2$ have identical distributions, you get $X_2=-X_1$ with probability $1$ and so $\bar X=0$, a constant with zero variance. For this to even be possible, you need the distribution of $X_1$ and $X_2$ to be symmetric about $0$.

Much the same thing happens with larger $n$. Here is a simulation using R of ten observations from a multivariate normal distribution and taking the correlation matrix as the covariance matrix when $n=6$:

library(mvtnorm)
set.seed(2022)
n <- 6
covmatrix <- matrix(rep(-1/(n-1), n^2), ncol=n) + diag(n)*n/(n-1) 
covmatrix
#      [,1] [,2] [,3] [,4] [,5] [,6]
# [1,]  1.0 -0.2 -0.2 -0.2 -0.2 -0.2
# [2,] -0.2  1.0 -0.2 -0.2 -0.2 -0.2
# [3,] -0.2 -0.2  1.0 -0.2 -0.2 -0.2
# [4,] -0.2 -0.2 -0.2  1.0 -0.2 -0.2
# [5,] -0.2 -0.2 -0.2 -0.2  1.0 -0.2
# [6,] -0.2 -0.2 -0.2 -0.2 -0.2  1.0
sims <- rmvnorm(10, mean=rep(0,n), sigma=covmatrix)
sims
#             [,1]        [,2]        [,3]        [,4]       [,5]        [,6]
# [1,]  2.05353690 -0.21785513  0.08433481 -0.51489124  0.7048736 -2.10999909
# [2,] -1.34855528  0.11628539  0.63282192  0.07644164  0.9140224 -0.39101608
# [3,] -0.59594809  0.58136474  0.42176678  0.39159438 -0.2369455 -0.56183242
# [4,]  0.21283317  0.03699741 -0.50479400 -0.48377217  0.3156341  0.42310169
# [5,] -0.20142159 -0.76144379  0.89221894  0.53952061 -0.3872322 -0.08164199
# [6,]  0.23320763  1.01507018 -1.60352276 -0.13737918 -0.7910043  1.28362840
# [7,]  0.86343020 -0.16358892 -0.14731998 -0.50410625 -0.6139985  0.56558351
# [8,] -0.64649148 -0.98177522  0.65169196 -0.28309427  0.9583281  0.30134078
# [9,] -2.00950883 -0.17680495  1.27467798  0.51076605 -0.3755420  0.77641190
#[10,] -0.04605967  0.31143296 -0.35956513  0.90532379 -0.8817965  0.07066465
Xbar <- rowMeans(sims)
Xbar
# [1] -3.246942e-08  5.724417e-09 -1.458760e-08  2.749809e-08 -5.479594e-09
# [6] -3.169053e-09  9.711573e-09 -1.781134e-08  1.764540e-08  1.170931e-08
var(Xbar)
# [1] 3.284356e-16

so neither the simulated $\bar X$s nor their variance are precisely $0$ but close enough to make the point.

Related Question