[Math] Covariance between given iid random variable and sample average

covariancemeansvariance

Apologies if this may look like a silly question.

Consider a sample of size $N$. Let $y_i$ be the $i$-th realization of a random variable that is independently and identically distributed with mean $\mu$ and variance $\sigma^2$.

Let $\bar{y}$ be the mean of $y_i$ across all $i$. We know that $Var(\bar{y})=\sigma^2/N$.

How about $Cov(y_i,\bar{y})$? I would say that

$Cov(y_i,\bar{y})=Cov\left(y_i,\frac{\sum_i y_i}{N}\right)=\frac{1}{N}\left[Var(y_i)+(N-1)Cov(y_i,y_j)\right]=\sigma^2/N$

Where the latter follows because of the independence assumption. Is that right? Many thanks.

Best Answer

Overall this looks good. I have a few minor concerns and comments:

It would help to specify the definition $\overline{Y}= \frac{1}{N}\sum_{i=1}^N Y_i$. (At first I was confused since I thought you meant $\overline{Y}=E[Y_i]$).
I would personally prefer to write your equality as $$ Cov(Y_i, \overline{Y}) = \frac{1}{N}\left[Var(Y_i) + \sum_{j \neq i} Cov(Y_i, Y_j)\right] $$ This avoids the "dangling and undefined index $j$" in
$$ Cov(Y_i, \overline{Y}) = \frac{1}{N}\left[Var(Y_i) + (N-1)\underbrace{Cov(Y_i,Y_j)}_{\mbox{undefined $j$}}\right]$$
The basic principles you are applying are \begin{align} Cov(Z, aW) &= a Cov(Z, W)\\ Cov\left(Z, \sum_{i=1}^N R_i\right) &= \sum_{i=1}^N Cov(Z, R_i) \end{align} for any random variables $Z, W, R_1, ..., R_N$ with finite means and variances, and any scalar $a$ and positive integer $N$.

Related Solutions

[Math] Deriving the variance of the sample mean $\mathrm{Var} (\bar{y})=\frac{1}{n}(1-\frac{n}{N})S^2$

Writing the expansion a little differently, we have $$ \mathrm{Var}(\bar{y})=\frac{1}{n^2}\left[\sum_{i=1}^ny_i^2\mathrm{Var}(I_i)+\sum_{i=1}^n\sum_{j=1,j\neq i}^ny_iy_j\mathrm{Cov}(I_i,I_j)\right]. $$ Now, the number of samples containing both $i$ and $j$ is $\binom{N-2}{n-2}$ and hence the probability of including both $i$ and $j$ in the sample is (important that $i\neq j$) $$ {\rm E}[I_iI_j]=P(I_i=1,I_j=1)=\frac{\binom{N-2}{n-2}}{\binom{N}{n}}\frac{n(n-1)}{N(N-1)}. $$ Thus $$ \mathrm{Cov}(I_i,I_j)={\rm E}[I_iI_j]-{\rm E}[I_i]{\rm E}[I_j]=\frac{n(n-1)}{N(N-1)}-\left(\frac{n}{N}\right)^2=-\frac{n(1-n/N)}{N(N-1)} $$ and we conclude that $$ \begin{align} \mathrm{Var}(\bar{y})&=\frac{1}{n^2}\left[\sum_{i=1}^n y_i^2\frac{n}{N}\left(1-\frac{n}{N}\right)-\sum_{i=1}^n\sum_{j\neq i}y_iy_j\frac{n(1-n/N)}{N(N-1)}\right]\\ &=\frac{1}{n}\left(1-\frac{n}{N}\right)\frac{1}{N}\left[\sum_{i=1}^n y_i^2-\frac{1}{N-1}\sum_{i=1}^n\sum_{j\neq i}y_iy_j\right]\\ &=\frac{1}{n}\left(1-\frac{n}{N}\right)\frac{1}{N-1}\sum_{i=1}^n (y_i-\mu)^2\\ &=\left(1-\frac{n}{N}\right)\frac{S^2}{n}, \end{align} $$ where $\mu=\frac{1}{N}\sum\limits_{i=1}^N y_i$ is the population mean.

Variance of sample mean of correlated random variables is zero

There is nothing wrong with your work. It is just that you have an extreme (and so slightly weird) case. Note that the pairwise correlations cannot be less than $-\frac1{n-1}$ if you want the matrix to be positive semidefinite and even then there will be restrictions on the distribution of the $X_i$ to achieve this. The extreme effect is to make $\bar X=\mu$ $(=0$ here$)$ almost surely.

For example when $n=2$ this gives $\rho=-1$ and since $X_1$ and $X_2$ have identical distributions, you get $X_2=-X_1$ with probability $1$ and so $\bar X=0$, a constant with zero variance. For this to even be possible, you need the distribution of $X_1$ and $X_2$ to be symmetric about $0$.

Much the same thing happens with larger $n$. Here is a simulation using R of ten observations from a multivariate normal distribution and taking the correlation matrix as the covariance matrix when $n=6$:

library(mvtnorm)
set.seed(2022)
n <- 6
covmatrix <- matrix(rep(-1/(n-1), n^2), ncol=n) + diag(n)*n/(n-1) 
covmatrix
#      [,1] [,2] [,3] [,4] [,5] [,6]
# [1,]  1.0 -0.2 -0.2 -0.2 -0.2 -0.2
# [2,] -0.2  1.0 -0.2 -0.2 -0.2 -0.2
# [3,] -0.2 -0.2  1.0 -0.2 -0.2 -0.2
# [4,] -0.2 -0.2 -0.2  1.0 -0.2 -0.2
# [5,] -0.2 -0.2 -0.2 -0.2  1.0 -0.2
# [6,] -0.2 -0.2 -0.2 -0.2 -0.2  1.0
sims <- rmvnorm(10, mean=rep(0,n), sigma=covmatrix)
sims
#             [,1]        [,2]        [,3]        [,4]       [,5]        [,6]
# [1,]  2.05353690 -0.21785513  0.08433481 -0.51489124  0.7048736 -2.10999909
# [2,] -1.34855528  0.11628539  0.63282192  0.07644164  0.9140224 -0.39101608
# [3,] -0.59594809  0.58136474  0.42176678  0.39159438 -0.2369455 -0.56183242
# [4,]  0.21283317  0.03699741 -0.50479400 -0.48377217  0.3156341  0.42310169
# [5,] -0.20142159 -0.76144379  0.89221894  0.53952061 -0.3872322 -0.08164199
# [6,]  0.23320763  1.01507018 -1.60352276 -0.13737918 -0.7910043  1.28362840
# [7,]  0.86343020 -0.16358892 -0.14731998 -0.50410625 -0.6139985  0.56558351
# [8,] -0.64649148 -0.98177522  0.65169196 -0.28309427  0.9583281  0.30134078
# [9,] -2.00950883 -0.17680495  1.27467798  0.51076605 -0.3755420  0.77641190
#[10,] -0.04605967  0.31143296 -0.35956513  0.90532379 -0.8817965  0.07066465
Xbar <- rowMeans(sims)
Xbar
# [1] -3.246942e-08  5.724417e-09 -1.458760e-08  2.749809e-08 -5.479594e-09
# [6] -3.169053e-09  9.711573e-09 -1.781134e-08  1.764540e-08  1.170931e-08
var(Xbar)
# [1] 3.284356e-16

so neither the simulated $\bar X$s nor their variance are precisely $0$ but close enough to make the point.

Best Answer

Related Solutions

[Math] Deriving the variance of the sample mean $\mathrm{Var} (\bar{y})=\frac{1}{n}(1-\frac{n}{N})S^2$

Variance of sample mean of correlated random variables is zero

Related Question