Solved – Significance testing of three or more correlations using Fisher’s transformation

correlation

Following on from my earlier posts, as far as I can understand, if I have three correlation coefficients, I will have to test them in pairs to see if there is a significant difference among them.

This means that I would have to use Fishers transformation to work out the z score of r and then the p value of z (which the recommended calculators in the earlier posts do, thankfully) and then ascertain if the p value is higher or lower than my alpha value (0.05) for each pair.

i.e. If 21 to 30 years olds is Age Group 1, 31 to 40 years is Age Group 2, and 41 to 50 years is Age Group 2, my comparison of the correlations between their shopping habits and weight loss would be:

  • Group 1 vs. Group 2
  • Group 1 vs. Group 3
  • Group 2 vs. Group 3

Instead of doing three separate calculations, is there a way to do all these calculations in a single step?

Best Answer

Your question is a perfect example of regression models with quantitative and qualitative predictors. Specifically, the three age groups -- $1,2, \& \,3$ -- are the qualitative variables and the quantitative variables are shopping habits and weight loss (I am guessing this because you are calculating correlations).

I must stress that this is much better way of modeling than calculating separate group-wise correlations because you have more data to model, hence your error estimates (p-values, etc) will be more reliable. A more technical reason is the resulting higher degrees of freedom in the t-test statistic for testing the significance of the regression coefficients.

Operating by the rule that $c$ qualitative predictors can be handled by $c-1$ indicator variables, only two indicator variables, $X_1, X_2$, are needed here that are defined as follows:

$$ X_1 = 1 \text{ if person belongs to group 1}; 0 \text{ otherwise} . $$ $$ X_2 = 1 \text{ if person belongs to group 2}; 0 \text{ otherwise}. $$

This implies that group $3$ is represented by $X_1=0, X_2=0$; represent your response -- shopping habit as $Y$ and the quantitative explanatory variable weight loss as $W$. You are now fit this linear model

$$ E[Y]=\beta_0 + \beta_1X_1 + \beta_2X_2 + \beta_3W. $$ The obvious question is does it matter if we change $W$ and $Y$ (because I randomly chose shopping habits as the response variable). The answer is, yes -- the estimates of the regression coefficients will change, but the test for "association" between conditioned on groups (here t-test, but it is same as testing for correlation for a single predictor variable) won't change. Specficially,

$$ E[Y]= \beta_0 + \beta_3W \text{ -- for third group}, $$ $$ E[Y]= (\beta_0 + \beta_2)+\beta_3W \text{ -- for second group}, $$ $$ E[Y]= (\beta_0 + \beta_1)+\beta_3W \text{ -- for first group}, $$ This is equivalent to having 3 separate lines, depending on the groups, if you plot $Y$ vs $W$. This is a good way to visualize what you are testing for makes sense (basically a form of EDA and model checking, but you need to distinguish between grouped observations properly). Three parallel lines indicate no interaction between the three groups and $W$, and a lot of interaction implies these lines will be intersecting each other.

How do the tests that you ask. Basically, once you fit the model and have the estimates, you need to test some contrasts. Specifically for your comparisons:

$$ \text{Group 2 vs Group 3: } \beta_2 + \beta_0 - \beta_0 = 0, $$ $$ \text{Group 1 vs Group 3: } \beta_1 + \beta_0 - \beta_0 = 0, $$ $$ \text{Group 2 vs Group 1: } \beta_2 + \beta_0 - (\beta_0+\beta_1) = 0. $$