Hypothesis Testing – How to Choose the Best Test Statistic for Permutation Test?

hypothesis testing

Is the purpose of permutation test to test the null that several groups of samples come from the same distribution?

I found its steps are

The steps in a permutation-based computation of the significance level
of a test statistic are as follows:

i) Choose a test statistic, eg. a t-score for a comparison of two groups,

ii) Compute the test statistic for the gene of interest,

iii) Permute the labels on samples at random, and re-compute the test statistic for the rearranged labels; repeat for a large number
(perhaps 1,000) permutations, and finally,

iv) Compute the fraction of cases in which the test statistics from iii) exceed the real test statistic from ii).

What kinds of test statistic should one choose in the first step?

The example uses the t-score, which measures the difference between two groups. But it seems to me that any statistic will work, not necessarily measuring the difference between two groups. Is it correct?

Thanks and regards!

Best Answer

Often there are several statistics that will all result in the same p-value/result. For example in a 2 sample case the difference of the 2 means, the mean of group A, and the sum of the values in group A will all result in the same p-value (this is because given the data values and sample sizes you can calculate the 1st 2 given only the 3rd). I would expect the t statistic to be similar to any of the above, but may not be exactly the same (due to the dividing by standard deviation(s)). There are other statistics that could be very different in the results, possibly the difference of the 2 medians, or the ratio of the 2 variances. These other statistics will be affected differently by the permutation process.

Your choice should be based on a combination of what is most interesting based on the science and question being asked (sometimes medians might be of more interest, other times means would be) and what will give you power to detect a difference in reasonable/meaningful alternatives. You can test this later by simulating data from cases that you think likely or interesting and watching how the statistics perform.

Related Question