Correlation – How to Test for a Significant Difference Between Several Dependent Correlation Coefficients?

correlation

I have a group of correlation coefficients (more than two). They are all dependent on one variable A in the form of r_A1, r_A2, r_A3….r_Ak, where 1, 2 …k denote other variables; they all have the same sample size.

My question is: what statistics should I use when I want to know whether any one of the correlation coefficients is any different to any of the others? I know if there are only two dependent correlation coefficients, this can be easily compared using most statistical tools, but this is a multiple correlations test. I know that a Chi-square test can be used to compare equality of several correlation coefficients (see http://home.ubalt.edu/ntsbarsh/business-stat/otherapplets/MultiCorr.htm for an example), but to my knowledge this approach is for testing the difference between INDEPENDENT correlation coefficients. So I am wondering whether there is any approach that is equivalent to Fisher's least significant difference that can be used to make comparisons among several dependent correlation coefficients?

EDIT: thanks @russ-lenth for your answer. In general, I found that CIs computed from lsmeans are larger than those computed using Fisher's Z method. Here's an example of the CIs that I get through the lsmeans function:

 rep.meas      lsmean         SE df   lower.CL  upper.CL
 M1        0.76914236 0.13325688 23  0.4934795 1.0448052
 M2        0.82346705 0.11830361 23  0.5787374 1.0681967
 M3        0.89294217 0.09386717 23  0.6987631 1.0871212
 M4       -0.09985512 0.20747224 23 -0.5290441 0.3293339
 M5        0.56183690 0.17249315 23  0.2050076 0.9186662
 M6        0.79086279 0.12760947 23  0.5268825 1.0548431
 M7        0.14667681 0.20625924 23 -0.2800029 0.5733566

Take M1 whose r = 0.769 as an example: the width of the CI from lsmeans is (1.0448-0.4935) 0.5513. The width of the CI computed from Fisher’s Z is (0.8948-0.5302) 0.3646, which is much smaller than the former. Is the difference between the widths of the two confidence intervals too large?

Best Answer

Well, here's an idea...

First standardize all of the variables (the model outputs as well as the reference variable).

Then fit a multivariate model with those standardized model outputs as a multivariate response variable (not predictors), and the common human-performance variable (standardized) as the predictor. Do not include an intercept, as it will be zero anyway. Then the regression coefficients will be equal to the correlation coefficients, due to the standardization, and their covariance matrix will be available; so you can estimate each pairwise difference and its standard error.

R example

This is using the swiss dataset provided with R. Here is the standardized dataset

> swiss.std = as.data.frame(lapply(swiss, function(x) (x-mean(x))/sd(x)))

Note the covariances are just the correlations

> cov(swiss.std[1:4])
             Fertility Agriculture Examination  Education
Fertility    1.0000000   0.3530792  -0.6458827 -0.6637889
Agriculture  0.3530792   1.0000000  -0.6865422 -0.6395225
Examination -0.6458827  -0.6865422   1.0000000  0.6984153
Education   -0.6637889  -0.6395225   0.6984153  1.0000000

Fit the multivariate model, looking at correlations with Fertility

> swiss.mlm = lm(cbind(Agriculture,Examination,Education) ~ Fertility - 1, data = swiss.std)

Here are the coefficients and variance-covariance matrix thereof

> coef(swiss.mlm)
          Agriculture Examination  Education
Fertility   0.3530792  -0.6458827 -0.6637889

> vcov(swiss.mlm)
                      Agriculture:Fertility Examination:Fertility Education:Fertility
Agriculture:Fertility           0.019029024          -0.009967271        -0.008807663
Examination:Fertility          -0.009967271           0.012670338         0.005862729
Education:Fertility            -0.008807663           0.005862729         0.012160529

So to compare, say, the 2nd and 3rd correlation, here's the estimate

> con = c(0,1,-1)
> coef(swiss.mlm) %*% con
                [,1]
Fertility 0.01790615

and its SE

> sqrt(sum(con * vcov(swiss.mlm) %*% con))
[1] 0.1144789

I can trick the lsmeans package into doing it:

> library(lsmeans)
> swiss.lsm = lsmeans(swiss.mlm, "rep.meas")

rep.meas is the default name for the levels of the multivariate response. So far, swiss.lsm is just estimating the mean, $(0,0,0,)$, but I'll change the linear function that it's using to be $1$ times each regression coefficient

> swiss.lsm@linfct = diag(c(1,1,1))

Now, here is the summary

> swiss.lsm
 rep.meas        lsmean        SE df    lower.CL   upper.CL
 Agriculture  0.3530792 0.1379457 46  0.07540884  0.6307495
 Examination -0.6458827 0.1125626 46 -0.87245946 -0.4193060
 Education   -0.6637889 0.1102748 46 -0.88576050 -0.4418172

and the pairwise comparisons:

> pairs(swiss.lsm)
 contrast                    estimate        SE df t.ratio p.value
 Agriculture - Examination 0.99896189 0.2272309 46   4.396  0.0002
 Agriculture - Education   1.01686804 0.2209183 46   4.603  0.0001
 Examination - Education   0.01790615 0.1144789 46   0.156  0.9866

P value adjustment: tukey method for a family of 3 means 

If I want to compare the absolute correlations, just change the linear function

> swiss.lsm@linfct = diag(c(1,-1,-1))
> swiss.lsm
 rep.meas       lsmean        SE df   lower.CL  upper.CL
 Agriculture 0.3530792 0.1379457 46 0.07540884 0.6307495
 Examination 0.6458827 0.1125626 46 0.41930596 0.8724595
 Education   0.6637889 0.1102748 46 0.44181722 0.8857605

Confidence level used: 0.95 

> pairs(swiss.lsm)
 contrast                     estimate        SE df t.ratio p.value
 Agriculture - Examination -0.29280352 0.1084658 46  -2.700  0.0257
 Agriculture - Education   -0.31070967 0.1165085 46  -2.667  0.0279
 Examination - Education   -0.01790615 0.1144789 46  -0.156  0.9866

P value adjustment: tukey method for a family of 3 means 
Related Question