Solved – Correlations of correlations using 3 data sets

correlationr

I'm interested in how 14 environmental parameters co-vary with other environmental parameters and whether they show the same patterns in 3 regions. So 1st, I made a pairwise correlation table of the 14 environmental variables for each region. Then, I made another pairwise correlation table of correlation values from 3 regions (i.e. region1 vs region2 & region1 vs region3 & region2 vs region3). I found a good correlation (a similar pattern) between region2 and region3 (or called "sector" in graphs).

Correlations

What I want to do from here is to put the all correlations values from the 3 regions together in one graph and also calculate correlation values (i.e. correlations of correlation values) because I want to know which environmental variables are correlated well in all 3 regions. I thought of using PCA, but it's not really what I imagined to be the best way to present a pattern. Maybe a 3D graph would be good for presentation. I'm a R user. If you could instruct me with R language on how to make that kind of graph and also how to calculate 3-way correlations, that would be much appreciated.

Thanks!

Best Answer

In very broad terms I'd question the value of this. It is easy to concoct examples in which correlations are similar but the relationships between variables are different -- and in which correlations are different but the relationships between variables are similar. I write not only as someone interested in statistics but also as someone whose main applications are with environmental data.

Also, what you are proposing to do doing puts enormous weight on correlations as a catch-all summary measure, which necessarily cannot do justice to nonlinearities, clustering, outliers, etc., which are commonplace with environmental data. An analysis of analyses is not out of the question, but the great risk is that each analysis step is a step away from the data you are trying to understand.

Yet another negative: It is difficult to make sense of your graphs without labelling which correlation is which using the names of the variables. You have presumably 91 correlations, but labelling them all will just be confusing; labelling none of them will just be uninformative.

Suggesting a positive alternative would depend on a deeper acquaintance with your scientific objectives, but if these were my data I would start with a single pooled multivariate analysis of three regions and then see whether regions cluster in some low-dimensional subspace. PCA does indeed spring to mind if your variables are mostly or all measured variables.

You name yourself as a R user, but your graphs look like to me like Excel defaults. I suggest that your graphs should show bounds of $[-1,1]$ on both axes; shift the $x$ axis with its numeric labels away from the middle of the graph; and use open or hollow symbols such as "o" rather than solid symbols.

P.S. In statistics, parameters and variables are not alternative terms. Your parameters are all variables.

Whether you are a student or a professional, you might benefit from finding a friendly local statistician, or someone in your field with more statistical experience, to talk to.

(LATER) If you are determined to do this, an extension of @Dualinity's approach to parallel coordinate plots might help.

Related Question