Permutation Test – Randomisation/Permutation Test for Paired Vectors in R

permutation-testrstatistical significance

I'm not an expert, so forgive me if some of the terminology is a little clumsy. Happy to provide more information where required.

I have two vectors of 50 paired numeric values in R. I want to perform a two-tailed randomisation or permutation test to determine whether their differences are due to chance or not.

A permutation test (also called a randomization test, re-randomization test, or an exact test) is a type of statistical significance test in which the distribution of the test statistic under the null hypothesis is obtained by calculating all possible values of the test statistic under rearrangements of the labels on the observed data points.

I want to do this type of test because I believe the distributions of the values in the vectors violate the assumptions of other tests such as the t-test (for example, many of the numeric values in the vector are 0).

The permtest function in the BHH2 library, almost does what I want, but it operates on all $2^{50}$ permutations, which will take too long. Instead, I want to estimate the p-value, by sampling a large number of the possible permutations. I had a look in the coin package, but nothing in there seems to do a permutation test with sampling from paired numeric vectors.

Some googling lead me to this email, which suggests that the reason I can't find a package to do it is that it's a one-liner in R. Unfortunately, I'm not experienced enough with R to be able to produce that one-liner.

Is there a package or method that will perform a two-tailed paired permutation test using only a sample of the permutation space?

If not, would somebody be able to share a short bit of R code to do it?

Best Answer

Though I pointed in comments to the use of the coin package I think it's worth illustrating that a permutation/randomization test is really quite simple, so I have done it.

Here I write some R code to do a randomization test for a one sample test of location. The test randomly flips signs on the differences and computes the mean; this is equivalent to randomly assigning each pair of values to the x and y groups. The code below could be made significantly shorter (I could do it in two lines easily enough, or even one if you didn't mind slower code).

This code takes a few seconds on my machine:

# assumes the two samples are in 'x' and 'y' and x[i] and y[i] are paired
# set up:
B <- 99999
d <- x-y
m0 <- mean(d)

# perform a one-sample randomization test on d
# for the null hypothesis H0: mu_d = 0   vs H1 mu_d != 0  (i.e. two tailed)
# here the test statistic is the mean
rndmdist <- replicate(B,mean((rbinom(length(d),1,.5)*2-1)*d))

# two tailed p-value:
sum( abs(rndmdist) >= abs(m0))/length(rndmdist)

That's the whole thing.

Note that rbinom(length(d),1,.5)*2-1) gives a random -1 or 1 ... i.e. a random sign, so when we multiply by any set of signed d, it is equivalent to randomly assigning + or - signs to the absolute differences. [It doesn't matter what distribution of signs on d you start with, now the d will have random signs.]

Here, I compare it with a t-test on some made up data:

 set.seed(seed=438978)
 z=rnorm(50,10,2)
 x=z-rnorm(50,0,.5)
 y=z+.4+rnorm(50,0,.5)
 t.test(y-x) # gives p = 0.003156

 B <- 99999
 d <- x-y
 m0 <- mean(d)
 rndmdist <- replicate(B,mean((rbinom(length(d),1,.5)*2-1)*d))
 sum( abs(rndmdist) >= abs(m0))/length(rndmdist) 

When the t-test is valid it usually gives a very similar p-value to the completely enumerated permutation test, and a simulated p-value as above (when the number of simulations is sufficiently large) will converge to that second p-value.

At the number of replications used above, a true permutation p-value (i.e. from complete enumeration) of 0.05 will be estimated to within 0.001 (that is, will give a randomization p-value between 0.049 and 0.051) about 85% of the time and to within 0.002 over 99.5% of the time.