Covariance as double summation over all probability indices and other times as a single summation over joint probability indices.

covariancenotationsummation

When reading about covariance, I've seen covariance represented sometimes as a double summation, and other times as a single summation.

An example of the single summation version can be seen in Wolfram's definition:

$Cov(X,Y) = \sum\limits_{i} p_{i} (x_i- \mu_X) (y_i-\mu_Y)$

An example of the double summation can be seen on pg 545 of Strang's Introduction to Linear Algebra 5th Edition or in this math exchange post:

$Cov(X,Y) = \sum\limits_{i}\sum\limits_{j} p_{ij} (x_i- \mu_X) (y_j-\mu_Y)$

There doesn't seem to be an intuitive way to derive one from the other, and no definition of the covariance that I've seen bothers to explain or compare both formulas. How should I interpret these formulas?

Best Answer

This question has been answered as part of another question I asked by the accepted answer (by David K) in this post: https://math.stackexchange.com/a/3984564/333560.

To summarize his explanation, the single summation gets used when only a specific sequence of values, corresponding to the summation indices, have a non-zero probability. Technically the other combination of values exist (as in the double summation), but the probability is zero we can omit them. This is the case when we formulate $N$ discrete random numbers as a a vector in $N$-dimensional space.

Related Question