Hypothesis Testing – How to Choose the Best Test Statistic for Permutation Test?

hypothesis testing

Is the purpose of permutation test to test the null that several groups of samples come from the same distribution?

The steps in a permutation-based computation of the significance level
of a test statistic are as follows:

i) Choose a test statistic, eg. a t-score for a comparison of two groups,

ii) Compute the test statistic for the gene of interest,

iii) Permute the labels on samples at random, and re-compute the test statistic for the rearranged labels; repeat for a large number
(perhaps 1,000) permutations, and finally,

iv) Compute the fraction of cases in which the test statistics from iii) exceed the real test statistic from ii).

What kinds of test statistic should one choose in the first step?

The example uses the t-score, which measures the difference between two groups. But it seems to me that any statistic will work, not necessarily measuring the difference between two groups. Is it correct?

Thanks and regards!

Best Answer

Often there are several statistics that will all result in the same p-value/result. For example in a 2 sample case the difference of the 2 means, the mean of group A, and the sum of the values in group A will all result in the same p-value (this is because given the data values and sample sizes you can calculate the 1st 2 given only the 3rd). I would expect the t statistic to be similar to any of the above, but may not be exactly the same (due to the dividing by standard deviation(s)). There are other statistics that could be very different in the results, possibly the difference of the 2 medians, or the ratio of the 2 variances. These other statistics will be affected differently by the permutation process.

Your choice should be based on a combination of what is most interesting based on the science and question being asked (sometimes medians might be of more interest, other times means would be) and what will give you power to detect a difference in reasonable/meaningful alternatives. You can test this later by simulating data from cases that you think likely or interesting and watching how the statistics perform.

Related Solutions

Bootstrap Under H0 – Perform a Test for Difference of Two Means with Replacement within Groups or Pooled Sample

Here is my take on it, based on chapter 16 of Efron's and Tibshirani's An Introduction to the bootstrap (page 220-224). The short of it is that your second bootstrap algorithm was implemented wrongly, but the general idea is correct.

When conducting bootstrap tests, one has to make sure that the re-sampling method generates data that corresponds to the null hypothesis. I'll use the sleep data in R to illustrate this post. Note that I am using the studentized test statistic rather than just the difference of means, which is recommended by the textbook.

The classical t-test, which uses an analytical result to obtain information about the sampling distribution of the t-statistic, yields the following result:

x <- sleep$extra[sleep$group==1]
y <- sleep$extra[sleep$group==2]
t.test(x,y)
t = -1.8608, df = 17.776, p-value = 0.07939

One approach is similar in spirit to the more well-known permutation test: samples are taken across the entire set of observations whilst ignoring the grouping labels. Then the first $n1$ are assigned to the first group and the remaining $n2$ to the second group.

# pooled sample, assumes equal variance
pooled <- c(x,y)
for (i in 1:10000){
  sample.index <- sample(c(1:length(pooled)),replace=TRUE)
  sample.x <- pooled[sample.index][1:length(x)]
  sample.y <- pooled[sample.index][-c(1:length(y))]
  boot.t[i] <- t.test(sample.x,sample.y)$statistic
}
p.pooled <-  (1 + sum(abs(boot.t) >= abs(t.test(x,y)$statistic))) / (10000+1) 
p.pooled
[1] 0.07929207

However, this algorithm is actually testing whether the distribution of x and y are identical. If we are simply interested in whether or not their population means are equal, without making any assumptions about their variance, we should generate data under $H_0$ in a slightly different manner. You were on the right track with your approach, but your translation to $H_0$ is a bit different from the one proposed in the textbook. To generate $H_0$ we need to subtract the first group's mean from the observations in the first group and then add the common or pooled mean $\bar{z}$. For the second group we do the same thing.

$$ \tilde{x}_i = x_i - \bar{x} + \bar{z} $$ $$ \tilde{y}_i = y_i - \bar{y} + \bar{z}$$

This becomes more intuitive when you calculate the means of the new variables $\tilde{x}/\tilde{y}$. By first subtracting their respective group means, the variables become centred around zero. By adding the overall mean $\bar{z}$ we end up with a sample of observations centred around the overall mean. In other words, we transformed the observations so that they have the same mean, which is also the overall mean of both groups together, which is exactly $H_0$.

# sample from H0 separately, no assumption about equal variance
xt <- x - mean(x) + mean(sleep$extra)
yt <- y - mean(y) + mean(sleep$extra)

boot.t <- c(1:10000)
for (i in 1:10000){
  sample.x <- sample(xt,replace=TRUE)
  sample.y <- sample(yt,replace=TRUE)
  boot.t[i] <- t.test(sample.x,sample.y)$statistic
}
p.h0 <-  (1 + sum(abs(boot.t) >= abs(t.test(x,y)$statistic))) / (10000+1) 
p.h0
[1] 0.08049195

This time around we ended up with similar p-values for the three approaches.

Solved – How is the exact permutation test procedure carried out: iterating over permutations or using combinations of one group

Whether or not you take the combinations or permutations doesn't actually affect your results, as the number of permutations of $n_{A}$ specific objects in $A$ and $n_{B}$ specific objects in $B$ is the same for all combinations of $x_{1} ... x_{n_{A}}$ and $x_{n_{A+1}} ... x_{n_{A} + n_{B}}$ since the size of each set doesn't change.

That is, for each and any given combination, you will get $n_{A}! \times n_{B}!$ times as many permutations than combinations regardless of the values inside each set. And as the value of the result (the difference between group means) does not change between permutations of the same combination the frequency of each specific result will be scaled equally when taking the permutation. So when calculating the quantiles practically it makes no difference using combinations or permutations. In fact you empirically proved it for the case of $n_{A} = 1$ and $n_{b} = 2$, the frequency of each result, $D = 0,2,4$, is just scaled by $2$ when taking permutations resulting in the quantile values being the same.

Let's assume the standard scenario where samples are independent, and we want to test if two samples come from the same distribution (null hypothesis) based on the difference in sample means

To be technical if you want to test this specific hypothesis I think it is more strictly "correct" to take the complete set of permutations (not combinations) of each set, as the distribution assumption under the null that group labels don't matter, is essentially allowing each $x_{i.}$ to take every value in the presence of every other $x_{j \neq i.}$, which combinations do not allow for.

But again, the results of the quantiles for the empirical distribution are the same since the frequency of each result is just scaled by the same amount $n_{A}! \times n_{B}!$, so practically it doesn't matter.

Best Answer

Related Solutions

Bootstrap Under H0 – Perform a Test for Difference of Two Means with Replacement within Groups or Pooled Sample

Solved – How is the exact permutation test procedure carried out: iterating over permutations or using combinations of one group

Related Question