Solved – Random variables have non-zero covariance but expected sample covariance is zero? (intuition)

correlationcovariancecross correlationintuition

This post asks "why a familiar and widely used estimator of sample covariance has expected value zero, in a situation where the variables involved are characterized by non-zero and equal pair-wise covariance"?

Specifically, the set up is as follows: we have a sequence of identically distributed random variables $\{X_1,…X_n\}$, and another sequence $\{Y_1,…,Y_n\}$ that have also identical distributions, but different than the $X$'s.
Moreover, the following holds:

$${\rm Cov}(X_i,Y_j) = {\rm Cov}(X_j,Y_i) \neq 0, \;\forall \{i \neq j\}\cup \{i=j\} \in \{1,…,n\} \tag{1}$$

Note that the above math imply also that

$$ {\rm Cov}(X_i,Y_j) = {\rm Cov}(X_i,Y_i) \tag{2}$$

This is critical for the results to follow.

(Note: Initially I have described the associations above as "equi-cross-correlation" but if you look at the comments of the thread it appears that the term describes something weaker. So I erased all references to it).

Since the elements of each sequence are identically distributed, we have that $E(X_i) = E(X_j) = E(X)$ and $E(Y_i) = E(Y_j) = E(Y)$. Then, in order to have equal pair-wise correlation coefficients, for $i\neq j$ but also for $i=j$, we must have

$$E(X_iY_j) = E(X_jY_i) = E(X_iY_i) = E(XY) \neq 0, \;\forall i,j \in \{1,…,n\}$$

We are told to consider what we know as an unbiased Covariance estimator

$${\rm \hat Cov}(X, Y) = \frac 1{n-1}\sum_{i=1}^n(X_i-\bar X)(Y_i-\bar Y)$$

with $\bar X = \frac 1{n}\sum_{i=1}^nX_i$ and likewise for the $Y$'s.

Expanding the product, we get

$${\rm \hat Cov}(X, Y) = \frac 1{n-1}\sum_{i=1}^nX_iY_i – \frac n{n-1}\left(\frac 1n \sum_{i=1}^nX_i\right) \left(\frac 1n \sum_{i=1}^nY_i\right)$$

$$= \frac 1{n-1}\sum_{i=1}^nX_iY_i – \frac n{n-1}\frac 1{n^2}\left(\sum_{i=1}^n\sum_{j=1}^nX_iY_j\right)$$

Taking the expected value of the estimator

$$E\left[{\rm \hat Cov}(X, Y)\right] = \frac 1{n-1}\sum_{i=1}^nE(X_iY_i) – \frac n{n-1}\frac 1{n^2}\left(\sum_{i=1}^n\sum_{j=1}^nE(X_iY_j)\right)$$

From previously, we have assumed that $E(X_iY_i) = E(X_iY_j) = E(X_jY_i) = E(XY)$. More over the double sum has $n^2$ elements, so we get

$$E\left[{\rm \hat Cov}(X, Y)\right] = \frac 1{n-1}nE(XY) – \frac n{n-1}\frac 1{n^2}n^2E(XY) =0$$

Great. We have "seriously entangled" (and "linearly" so) random variables, and the unbiased sample covariance, an almost "automatic" metric to calculate when getting to know the data, has expected value zero…

Some twisted, "Theater of the Absurd" intuition can be gleaned from the phrase "if we cannot distinguish between the pair $\{X_i, Y_i\}$ and the pair $\{X_i, Y_j\}$, as regards covariance, we "conclude" that said covariance is zero", but for the time being this sounds more absurd than intuitive.

I understand that the situation described by assumptions $(1)$ and $(2)$ may be of rather limited applied interest, even for moderately large $n$, because if we try to translate it into real-world relations, it pictures too many and at the same time too similar associations, to be probable/believable.

But I feel this is not just a "theoretical curiosity" but it may be telling us something useful about the limitations of our tools… something that may be already well-known -but since it is not well-known to me, I decided to post it as a question.

Any ideas or explanations to better understand the above situation?

"Layman" approaches as well as advanced mathematical ones are equally welcome.

Best Answer

The conditions on the covariances will force the $X_i$ to be strongly correlated to one another, and the $Y_j$ to be strongly correlated to each other, when the mutual correlations between the $X_i$ and $Y_j$ are nonzero. As a model to develop intuition, then, let's let both $(X_i)$ and $(Y_j)$ have an exponential autocorrelation function

$$\rho(X_i, X_j) = \rho(Y_i, Y_j) = \rho^{|i-j|}$$

for some $\rho$ near $1$. Also take every $X_i$ and $Y_j$ to have zero expectation and unit variance. Let $\text{Cov}(X_i,Y_j)=\alpha$. (For any given $n$ and $\alpha$, the possible values of $\rho$ will be limited to an interval containing $1$ due to the necessity of creating a positive-definite correlation matrix.)

In this model the covariance (equally well, the correlation) matrix in terms of $(X_1, \ldots, X_n, Y_1, \ldots, Y_n)$ will look like

$$\begin{pmatrix} 1 & \rho & \cdots & \rho^{n-1} & \alpha & \alpha & \cdots & \alpha \\ \rho & 1 & \cdots & \rho^{n-2} & \alpha & \alpha & \cdots & \alpha \\ \vdots & \vdots & \cdots & \vdots & \vdots & \vdots & \cdots & \vdots \\ \rho^{n-1} & \cdots & \rho & 1 & \alpha & \alpha & \cdots & \alpha \\ \alpha & \alpha & \cdots & \alpha & 1 & \rho & \cdots & \rho^{n-1} \\ \alpha & \alpha & \cdots & \alpha &\rho & 1 & \cdots & \rho^{n-2} \\ \vdots & \vdots & \cdots & \vdots & \vdots & \vdots & \cdots & \vdots \\ \alpha & \alpha & \cdots & \alpha & \rho^{n-1} & \cdots & \rho & 1 \end{pmatrix}$$

A simulation (using $2n$-variate Normal random variables) explains much. This figure is a scatterplot of all $(X_i,Y_i)$ from $1000$ independent draws with $\rho=0.99$, $\alpha=-0.6$, and $n=8$.

Figure

The gray dots show all $8000$ pairs $(X_i,Y_i)$. The first $70$ of these $1000$ realizations have been separately colored and surrounded by $80\%$ confidence ellipses (to form visual outlines of each group).

The orientations of these ellipses have a uniform distribution: on average, there is no correlation among individual collections $((X_1,Y_1), \ldots, (X_n,Y_n))$.

Figure 2: histogram of orientations.

However, due to the induced positive correlation among the $X_i$ (equally well, among the $Y_j$), all the $X_i$ for any given realization tend to be tightly clustered. From one realization to another they tend to line up along a downward slanting line, with some scatter around it, thereby realizing a cloud of correlation $\alpha=-0.6$.

We might summarize the situation by saying by recentering the data, the sample correlation coefficient does not account for the variation among the means of the $X_i$ and means of the $Y_j$. Since, in this model, the correlation between those two means is exactly the same as the correlation between any $X_i$ and any $Y_j$ (namely $\alpha$), the expected correlation nets out to zero.


Here is working R code to play with the simulation.

library(MASS)
#set.seed(17)
n.sim <- 1000
alpha <- -0.6
rho <- 0.99
n <- 8
mu <- rep(0, 2*n)
sigma.11 <- outer(1:n, 1:n, function(i,j) rho^(abs(i-j)))
sigma.12 <- matrix(alpha, n, n)
sigma <- rbind(cbind(sigma.11, sigma.12), cbind(sigma.12, sigma.11))
min(eigen(sigma)$values) # Must be positive for sigma to be valid.
x <- mvrnorm(n.sim, mu, sigma)
#pairs(x[, 1:n], pch=".")
library(car)
ell <- function(x, color, plot=TRUE) {
  if (plot) {
    points(x[1:n], x[1:n+n], pch=1, col=color)
    dataEllipse(x[1:n], x[1:n+n], levels=0.8, add=TRUE, col=color,
                center.cex=1, fill=TRUE, fill.alpha=0.1, robust=TRUE)
  }
  v <- eigen(cov(cbind(x[1:n], x[1:n+n])))$vectors[, 1]
  atan2(v[2], v[1]) %% pi
}
n.plot <- min(70, n.sim)
colors=rainbow(n.plot)
plot(as.vector(x[, 1:n]), as.vector(x[, 1:n + n]), type="p", pch=".", col=gray(.4),
     xlab="X",ylab="Y")
invisible(sapply(1:n.plot, function(i) ell(x[i,], colors[i])))
ev <- sapply(1:n.sim, function(i) ell(x[i,], color=colors[i], plot=FALSE))
hist(ev, breaks=seq(0, pi, by=pi/10))
Related Question