[Math] Combining Correlation Coefficients

st.statistics

I have a large data set, A, containing 100 x/y pairs. I've divided it into two smaller data sets, B and C, containing 30 and 70 x/y pairs respectively.

I have Pearson's product-moment correlation r for each of the two smaller data sets, B and C. Can I combine the correlation coefficients from the two smaller sets to generate the correlation coefficient for A?

This is for a programming problem I'm working on, and my dataset, A, is very large. I need to somehow calculate the correlation coefficient for it, but I'd like to split the dataset up into many smaller datasets, calculate the correlation for each small dataset, and then combine those correlations to get my result for the dataset as a whole. Is it possible?

Thanks!

Best Answer

You can't do that, as Gerry Myerson has pointed out.

If you want a way to break down the computation, though, go back to one of the formulas for it:

$$ r_{xy} = {n \sum_i x_i y_i - \sum_i x_i \sum_i y_i \over \sqrt{n \sum_i x_i^2 - (\sum x_i)^2} \sqrt{n \sum_i y_i^2 - (\sum_i y_i)^2}}. $$

(See the wikipedia article, under "mathematical properties".)

So you just need to know $n, \sum_i x_i y_i, \sum_i x_i$ and $\sum_i y_i$ for the whole data set. And these will just be the sum of the corresponding quantities for each subset.

Related Question