Solved – Combining 2 covariance matrices

covariance-matrix

Given 2 square-symmetric covariance matrices whose sample sizes are not equal, can the following equation be used to compute the combined covariance ? I have been reading this article on wiki https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Covariance but was wondering why the equation below is not right (if it is not right).

$ C_x = \frac{C_aN_a + C_bN_b}{N_a+N_b} $

where C denotes the covariance and N the sample size.

Can anyone help me on this ?

Best Answer

It's a little unclear what you're asking here (the wikipedia page is for ordinary covariance but the question relates to covariance matrices) but I'll try to answer the question with regard to unbiasedness.

Assuming your covariance matrices are computed from the samples $\{ \textbf{a}_i \}_{i=1}^{N_a}$ and $\{ \textbf{b}_j \}_{j=1}^{N_b}$, the usual definition for the sample covariance matrix is $$ \textbf{C}_a = \frac{1}{N_a - 1} \sum_{i=1}^{N_a} ( \textbf{a}_i - \bar{\textbf{a}}) ( \textbf{a}_i - \bar{\textbf{a}})^T, $$ and similarly for $\textbf{C}_b$. Note that the denominator $N_a - 1$ makes the sample covariance matrix unbiased: $E[\textbf{C}_a] = Cov(\textbf{a}_i )$.

With this is mind, if you now compute the expected value of your proposed combined covariance you get: $$ E[\textbf{C}_x] = \frac{N_a}{N_a + N_b} Cov(\textbf{a}_i) + \frac{N_b}{N_a + N_b} Cov(\textbf{b}_j).$$ By itself this is of little use but if we furthermore assume that the two samples come from populations with equal covariance matrices (as is often done, see e.g. Hotelling's $T^2$ test), that is, $ Cov(\textbf{a}_i) = Cov(\textbf{b}_j) = \boldsymbol{\Sigma}$, we then have $$E[\textbf{C}_x] = \boldsymbol{\Sigma}. $$ Thus now $\textbf{C}_x$ is unbiased for the common population covariance and what you proposed is indeed the ''correct'' way of combining the two estimators.