Solved – How is the exact permutation test procedure carried out: iterating over permutations or using combinations of one group

combinatoricshypothesis testingp-valuepermutation-test

I've tried to find an article that explains the procedure of permutation tests for the exhaustive sampling of all permutations (not the monte carlo method) and couldn't find a resource that was specific enough to help me with the ambiguity outlined below. For example, the Wikipedia article (https://en.wikipedia.org/wiki/Resampling_(statistics)#Permutation_tests) says

For example, given a combined dataset (1, 2, 3) where group A has length 2 and group B has length 1 for simplicity, it is not clear to me whether "all possible ways to divide it" means {(1, 2), (3)} and {(2, 1), (3), …} or if they count "{(1, 2), (3)}" and "{(2, 1), (3)}" as the same division.

Looking at various code examples, for example, the Python, R, Julia, etc. examples on https://rosettacode.org/wiki/Permutation_test, I see that the permutation test is usually implemented as follows:

Given two samples A and B

Record the test statistic (e.g., $|\bar{A} – \bar{B}|$)
Combine samples A and B into one large sample AB
for combination of length(A) from AB:
3a) compute permutation statistic (e.g., $|\bar{A'} – \bar{B'}|$, where A' is the combination from 3. and B' are all the samples in AB that are not in A').
3b) record permutation statistic
Compute p-value as the proportion the permutation statistic from 3a) was more extreme than the test statistic from 1. divided by the number of combinations sampled

However, shouldn't we be sampling the permutations instead of combinations of length A? For example, as outlined below (I highlighted the difference to the previous procedure in bold):

Given two samples A and B

Record the test statistic (e.g., $|\bar{A} – \bar{B}|$)
Combine samples A and B into one large sample AB
for permutation of length(AB) from AB:
3a) compute permutation statistic (e.g., $|\bar{A'} – \bar{B'}|$, where A' are the first len(A) samples in the permuted AB sequence and B' are the remaining samples in AB
Compute p-value as the proportion the permutation statistic from 3a) was more extreme than the test statistic from 1. divided by the number of permutations sampled

Or, to provide a simple numeric example, consider the following 2 samples

a = [1, 3]
b = [2]

with the observed difference:
obs = |mean(a) – mean(b)| = 2

Using the "combinations" procedure, we would be sampling the following:

(1, 2), (3) => diff 0
(1, 3), (2) => diff 2
(2, 3), (1) => diff 4

where in 2 out of 3 cases, we would observe a difference equal or more extreme than in the observed statistic (i.e., p=2/3)

Now, using permutations, we would get the following:

(1, 2), (3) => diff 0
(1, 3), (2) => diff 2
(2, 1), (3) => diff 0
(2, 3), (1) => diff 4
(3, 1), (2) => diff 2
(3, 2), (1) => diff 4

Here, we observe a difference that is equal or more extreme than the observed statistic in 4 out of 6 cases (p=4/6)

Does anyone no more about the exact procedure and has a reliable resource at hand? Thanks!

Best Answer

Whether or not you take the combinations or permutations doesn't actually affect your results, as the number of permutations of $n_{A}$ specific objects in $A$ and $n_{B}$ specific objects in $B$ is the same for all combinations of $x_{1} ... x_{n_{A}}$ and $x_{n_{A+1}} ... x_{n_{A} + n_{B}}$ since the size of each set doesn't change.

That is, for each and any given combination, you will get $n_{A}! \times n_{B}!$ times as many permutations than combinations regardless of the values inside each set. And as the value of the result (the difference between group means) does not change between permutations of the same combination the frequency of each specific result will be scaled equally when taking the permutation. So when calculating the quantiles practically it makes no difference using combinations or permutations. In fact you empirically proved it for the case of $n_{A} = 1$ and $n_{b} = 2$, the frequency of each result, $D = 0,2,4$, is just scaled by $2$ when taking permutations resulting in the quantile values being the same.

Let's assume the standard scenario where samples are independent, and we want to test if two samples come from the same distribution (null hypothesis) based on the difference in sample means

To be technical if you want to test this specific hypothesis I think it is more strictly "correct" to take the complete set of permutations (not combinations) of each set, as the distribution assumption under the null that group labels don't matter, is essentially allowing each $x_{i.}$ to take every value in the presence of every other $x_{j \neq i.}$, which combinations do not allow for.

But again, the results of the quantiles for the empirical distribution are the same since the frequency of each result is just scaled by the same amount $n_{A}! \times n_{B}!$, so practically it doesn't matter.

Related Solutions

Solved – How do we create a confidence interval for the parameter of a permutation test

It's OK to use permutation resampling. It really depends on a number of factors. If your permutations are a relatively low number then your estimation of your confidence interval is not so great with permutations. Your permutations are in somewhat of a gray area and probably are fine.

The only difference from your prior code is that you'd generate your samples randomly instead of with permutations. And, you'd generate more of them, let's say 1000 for example. Get the difference scores for your 1000 replications of your experiment. Take the cutoffs for the middle 950 (95%). That's your confidence interval. It falls directly from the bootstrap.

You've already done most of this in your example. dif.treat is 462 items long. Therefore, you need the lower 2.5% and upper 2.5% cut offs (about 11 items in on each end).

Using your code from before...

y <- sort(dif.treat)
ci.lo <- y[11]
ci.hi <- y[462-11]

Off hand I'd say that 462 is a little low but you'll find a bootstrap to 10,000 comes out with scores that are little different (likely closer to the mean).

Thought I'd also add in some simple code requiring the boot library (based on your prior code).

diff <- function(x,i) mean(x[i[6:11]]) - mean(x[i[1:5]])
b <- boot(total, diff, R = 1000)
boot.ci(b)

Hypothesis Testing – How to Choose the Best Test Statistic for Permutation Test?

Often there are several statistics that will all result in the same p-value/result. For example in a 2 sample case the difference of the 2 means, the mean of group A, and the sum of the values in group A will all result in the same p-value (this is because given the data values and sample sizes you can calculate the 1st 2 given only the 3rd). I would expect the t statistic to be similar to any of the above, but may not be exactly the same (due to the dividing by standard deviation(s)). There are other statistics that could be very different in the results, possibly the difference of the 2 medians, or the ratio of the 2 variances. These other statistics will be affected differently by the permutation process.

Your choice should be based on a combination of what is most interesting based on the science and question being asked (sometimes medians might be of more interest, other times means would be) and what will give you power to detect a difference in reasonable/meaningful alternatives. You can test this later by simulating data from cases that you think likely or interesting and watching how the statistics perform.

Best Answer

Related Solutions

Solved – How do we create a confidence interval for the parameter of a permutation test

Hypothesis Testing – How to Choose the Best Test Statistic for Permutation Test?

Related Question