I have a dataset with two sets of variables defining each sample. I am performing pairwise correlations between each variable in the first set with every variable in the second set.
I essentially have $n$ samples. For each sample, I have two sets of variables $[a_1, …, a_{m1}]$ and $[b_1, …, b_{m2}]$ that define that particular sample. Correlating all the variables in $a$ with those in $b$ gives me a final correlation matrix of size $m_1 \times m_2$. In addition, some of the variables within set $a$ may be correlated with each other and some of the variables within set $b$ may also be correlated with each other, but I am not assessing those correlations.
I'm trying to figure out:
- if I need to correct for multiple comparisons, and
- if I do, what method I should use.
I have tried Bonferroni, but with the large number of comparisons I get an extremely large adjusted p-value.
Best Answer
You are testing $m_1m_2$ hypotheses $H_0: \rho=0$ and thus the $p$-values obtained from these tests should be adjusted. If you want to stick with adjusting the family-wise error rate(FWER) then Bonferroni is the standard, if severely conservative, way to go.
However, as noted, you are not actually conducting $M=m_1m_2$ independent tests. This scenario is common in the world of genetics wherein experimenters test many genes for an association with a disease but variation in genes that are physically proximal to each other is correlated. A solution was proposed by Cheverud et al. (1983) to obtain the number of "effective comparisons" $M_{eff}$ so that one can still control the ever-popular FWER without over-correcting. The method is described in this open-access publication. As you would have to wade through some genetics-jargon so I will give you the gist:
Given mean-centered data $X$ with dimension $m \times n$ and correlation matrix $Z=X^TX$, one can obtain the eigenvalues of $Z$ $\lambda_i$, $i \in \{1...n\}$ via eigendecomposition a.k.a principal components analysis(PCA) . As explained in the article-
Thus, the adjusted threshold after Bonferroni correction would be $\alpha_{adj}=\frac{\alpha}{M_{eff,a}M_{eff,b}}$ where $M_{eff,}$ is the effective size for matrices $a$ and $b$ respectively. The eigenvalues can be calculated in R using functions from the base package (namely princomp or prcomp depending on the cardinality of $a$ and $b$).