Solved – Permutation test for multiple correlation test statistics

correlationmultiple-comparisonspermutation-test

I read some great discussions about using permutation tests on correlation matrices to deal with Type I errors that arise from the multiple comparisons; however I have a question about the correlation test statistic. Specifically, what do you think about running a permutation test on a correlation matrix using multiple correlation test statistics? Can it be done, or not?

For example, I have an interval variable $(X)$ and set of 93 variables $(Y_{1-93})$ made up of different data types (interval, ordinal, nominal). I'm interested in investigating associations between $X$ and any of the $Y$ variables (I have 45 observations). Before reading about permutation tests, I was going to apply correlation test statistics for each of the ($X$, $Y_i$) variable pairs and then adjust each p-value using the Holm or Bonferroni method to account for the multiple comparisons. Specifically I was going to use Pearson's r for my interval-interval pairs, Kendall's Tau for my interval-ordinal pairs (because I have a lot of ties) and the F-statistic for my interval-nominal pairs.

However, instead of using Bonferroni or Holm (which I fear would be too conservative), I really like the idea of using a permutation test. I'm just not sure if my approach is correct.

Basically, I'm trying to follow the advice given here. The way I'm doing it is by randomly shuffling the values in my $X$ variable with each permutation and then re-running my multiple correlation tests between $X$ and $Y_{1-93}$ (using Kendall's Tau, Pearson's r, and the F-Statistic). I'm running 10,000 permutations and then comparing my original test statistic for each pair with the 2.5 and 97.5 percentiles from the resulting test statistic distribution of the permutation test. If my observed correlation test statistic is outside that 2.5-97.5 percentile range, then I'm reporting it as a significant association between $X$ and $Y_i$.

So strictly speaking, my data set isn't a typical $X$ by $X$ correlation matrix; rather it's $X$ by $Y$, where $X$ is my one dependant variable (interval) and $Y$ is a set of 93 independent variables (ordinal, interval, nominal).

All the literature I've read about permutations tests only use one test statistic (like Pearson's r), so I'm a bit nervous about my approach; however, I don't see a reason why I shouldn't be able to apply multiple test statistics in this kind of 'linear combination' approach. I welcome your thoughts.

Thanks in advance for taking the time to answer my question and I hope I was clear enough in the description of the problem.

(note: I know the F-statistic isn't a correlation test statistic, but it seems like the best statistic to use between interval and nominal pairs)

Best Answer

Why not simulate some data that is the same structure as your data, but without any correlation/relationships, then use that (probably do this multiple times) to see how your strategy behaves. If you use the permutation test for each of the 93 variables then you will still have multiple comparison issues and are still likely to declare 4-5 correlations as significant when they really are not due to chance.

To correct for multiple comparisons you would need to do something more along the lines of combining all your correlation measures (probably transformed to be on some similar scale) and comparing the combined measure to the permutation values. Combinations to consider would be the maximum correlation (absolute value) or the mean correlation.

Something more along the lines of the FDR would be to compare your strongest correlation to the strongest from the permuted distribution, then compare the 2nd strongest to the 2nd strongest from the permutations, etc.

Having a mixture of different correlation measures will complicate this, but you could either analyze the groups separately, or convert everything to be on a similar scale (p-value would be one) so that they are comparable.