Combining Covariance Matrices – Methods and Techniques

covariancemoments

I'm calculating the covariance of a distribution in parallel and I need to combine the distributed results into on singular Gaussian. How do I combine the two?

Linearly interpolating between the two almost works, if they are similarly distributed and sized.

Wikipedia provides a forumla at the bottom for combination but it doesn't seem right; two identically distributed distributions should have the same covariance, but the formula at the bottom of the page doubles the covariance.

Is there a way to combine two matrices?

Best Answer

This question comes up a lot in various guises. What is common to them is

How can I combine moment-based statistics that have been computed from disjoint subsets of my data?

The simplest application concerns data that have been split into two groups. You know the group sizes and the group means. In terms of these four quantities alone, what is the overall mean of the data?

Other applications generalize from means to variances, standard deviations, covariance matrices, skewnesses, and multivariate statistics; and might involve multiple subgroups of data. Notice that many of these quantities are somewhat complicated combinations of moments: the standard deviation, for instance, is the square root of a quadratic combination of the first and second moments (mean and mean square).

All such cases are easily handled by reducing the various moments to sums, because sums are obviously and easily combined: they are added. Mathematically, it comes down to this: you have a batch of data $X = (x_1, x_2, \ldots, x_n)$ that have been separated into disjoint groups of sizes $j_1, j_2, \ldots, j_g$: $(x_1, x_2, \ldots, x_{j_1}; x_{j_1+1}, \ldots, x_{j_1+j_2}; x_{j_1+j_2+1}, \ldots; \ldots; \ldots, x_n)$. Let's call the $i$th group $X_{(i)} = (x_{j_i+1},x_{j_i+2}, \ldots, x_{j_{i+1}})$. By definition, the $k$th moment of any batch of data $y_1, \ldots, y_j$ is the average of $k$th powers,

$$\mu_k(y) = \left(y_1^k + y_2^k + \cdots + y_j^k\right)/j.$$

Obviously $j \mu_k(y)$ is the sum of the $k$th powers. Therefore, referring to our previous decomposition of data into $g$ subgroups, we can break a sum of $n$ powers into groups of sums, obtaining

$$\eqalign{ n \mu_k(X) &= \left(x_1^k + x_2^k + \cdots + x_n^k\right) \\ &= \left(x_1^k + x_2^k + \cdots + x_{j_1}^k\right) + \cdots + \left(x_{j_1+\cdots+j_{g-1}+1}^k + x_{j_1+\cdots+j_{g-1}+2}^k + \cdots + x_n^k\right)\\ &= j_1 \mu_k(X_{(1)}) + j_2 \mu_k(X_{(2)}) + \cdots + j_g \mu_k(X_{(g)}). }$$

Dividing by $n$ exhibits the $k$th moment of the entire batch in terms of the $k$th moments of its subgroups.

In the present application, the entries in the covariance matrix are, of course, covariances, which are expressible in terms of multivariate second moments and first moments. The key part of the calculation comes down to this: at each step you will have focused on two particular components of your multivariate data; let's call them $x$ and $y$. The numbers you are looking at are in the form

$$((x_1,y_1), (x_2,y_2), \ldots, (x_n,y_n)),$$

broken up as before into $g$ groups. For each group you know the average sum of products of the $x_iy_i$: this is the $(1,1)$ multivariate moment, $\mu_{(1,1)}$. To combine these group values, you will multiply them by the group sizes, add up those results, and divide the total by $n$.

To apply this approach you need to think ahead: it is not possible to combine, say, covariances if you know only the covariances and the subgroup sizes: you also need to know the means of the subgroups (because means are involved in an essential way in all covariance formulas), or something algebraically reducible to the means. You also might need to take some care concerning any constants that appear in the formulas; the chief trap for the unwary is to confuse a "sample covariance" (which involves a sum of products divided by $n-1$) with a "population covariance" (where the division is by $n$). This does not introduce anything new; you just have to remember to multiply the sample covariance by $n-1$ (or group covariance by $j_i-1$) to recover the sum, rather than by $n$ (or $j_i$).


Oh, yes: about the present question. The formula given in the Wikipedia article is given in terms of group means (first moments) and the group sums of products. As I described above, these will get combined by adding them and then adjusting the results with a division to obtain the covariances. The final division by $n$ is not shown.