Solved – Significance of average correlation coefficient

Disclaimer: if you find this question to be too similar to another one, I happy for it to be merged. However, I did not find a satisfactory answer anywhere else (and do not yet have the "reputation" to comment or upvote), so I thought it would be best to ask a new question myself.

My question is this. For each of 12 human subjects, I have computed a correlation coefficient (Spearman's rho) between 6 levels of an independent variable X, and corresponding observations of a dependent variable Y. (Note: the levels of X are not equal across subjects.) My null hypothesis is that in the general population, this correlation is equal to zero. I have tested this hypothesis in two ways:

Using a one-sample t-test on the correlation coefficients obtained from my 12 subjects.
By centering my levels of X and observations of Y such that for each participant, mean(X) = 0 and mean(Y) = 0, and then computing a correlation over the aggregate data (72 levels of X and 72 observations of Y).

Now, from reading about working with correlation coefficients (here and elsewhere) I have started to doubt whether the first approach is valid. Particularly, I have seen the following equation pop up in several places, presented (apparently) as a t-test for average corelation coefficients:

$$t = \frac{r}{SE_{r}} = \frac{\sqrt{n-2}}{\sqrt{1-r^{2}}}$$

where $r$ would be the average correlation coefficient (and let's assume we've obtained this using Fisher's transformation on the per-subject coefficients first) and $n$ the number of observations. Intuitively, this seems wrong to me as it does not include any measure of the between-subject variability. In other words, if I had 3 correlation coefficients, I would get the same t-statistic whether they were [0.1, 0.5, 0.9] or [0.45 0.5 0.55] or any range of values with the same mean (and $n=3$)

I suspect, therefore, that the above equation does not in fact apply when testing the significance of an average of correlation coefficients, but when testing the significance of a single correlation coefficient based on $n$ observations of 2 variables.

Could anyone here please confirm this intuition or explain why it is wrong? Also, if this formula doesn't apply to my case, does anyone know a/the correct approach? Or perhaps my own test number 2 is already valid? Any help is greatly appreciated (including pointers to previous answers that I may have missed or misinterpreted).

Best Answer

A better approach to analysing this data is to use a mixed-model (a.k.a. mixed effects model, hierarchical model) with subject as a random effect (random intercept or random intercept + slope). To summarize a different answer of mine:

This is essentially a regression that models a single overall relationship while allowing that relationship to differ between groups (the human subjects). This approach benefits from partial pooling and uses your data more efficiently.

Best Answer

Related Solutions

Solved – Interpretation of Spearman’s rank correlation coefficient – beyond its significance

Solved – Fisher R-to-Z transform for group correlation stats

Related Question