Permutation Tests – Which Permutation Test Implementation in R to Use Instead of T-Tests (Paired and Non-Paired)?

nonparametricpermutation-testrt-test

I have data from an experiment that I analyzed using t-tests. The dependent variable is interval scaled and the data are either unpaired (i.e., 2 groups) or paired (i.e., within-subjects).
E.g. (within subjects):

x1 <- c(99, 99.5, 65, 100, 99, 99.5, 99, 99.5, 99.5, 57, 100, 99.5, 
        99.5, 99, 99, 99.5, 89.5, 99.5, 100, 99.5)
y1 <- c(99, 99.5, 99.5, 0, 50, 100, 99.5, 99.5, 0, 99.5, 99.5, 90, 
        80, 0, 99, 0, 74.5, 0, 100, 49.5)

However, the data are not normal so one reviewer asked us to use something other than the t-test. However, as one can easily see, the data are not only not normally distributed, but the distributions are not equal between conditions:
alt text

Therefore, the usual nonparametric tests, the Mann-Whitney-U-Test (unpaired) and the Wilcoxon Test (paired), cannot be used as they require equal distributions between conditions. Hence, I decided that some resampling or permutation test would be best.

Now, I am looking for an R implementation of a permutation-based equivalent of the t-test, or any other advice on what to do with the data.

I know that there are some R-packages that can do this for me (e.g., coin, perm, exactRankTest, etc.), but I don't know which one to pick. So, if somebody with some experience using these tests could give me a kick-start, that would be ubercool.

UPDATE: It would be ideal if you could provide an example of how to report the results from this test.

Best Answer

It shouldn't matter that much since the test statistic will always be the difference in means (or something equivalent). Small differences can come from the implementation of Monte-Carlo methods. Trying the three packages with your data with a one-sided test for two independent variables:

DV <- c(x1, y1)
IV <- factor(rep(c("A", "B"), c(length(x1), length(y1))))
library(coin)                    # for oneway_test(), pvalue()
pvalue(oneway_test(DV ~ IV, alternative="greater", 
                   distribution=approximate(B=9999)))
[1] 0.00330033

library(perm)                    # for permTS()
permTS(DV ~ IV, alternative="greater", method="exact.mc", 
       control=permControl(nmc=10^4-1))$p.value
[1] 0.003

library(exactRankTests)          # for perm.test()
perm.test(DV ~ IV, paired=FALSE, alternative="greater", exact=TRUE)$p.value
[1] 0.003171822

To check the exact p-value with a manual calculation of all permutations, I'll restrict the data to the first 9 values.

x1 <- x1[1:9]
y1 <- y1[1:9]
DV <- c(x1, y1)
IV <- factor(rep(c("A", "B"), c(length(x1), length(y1))))
pvalue(oneway_test(DV ~ IV, alternative="greater", distribution="exact"))
[1] 0.0945907

permTS(DV ~ IV, alternative="greater", exact=TRUE)$p.value
[1] 0.0945907

# perm.test() gives different result due to rounding of input values
perm.test(DV ~ IV, paired=FALSE, alternative="greater", exact=TRUE)$p.value
[1] 0.1029412

# manual exact permutation test
idx  <- seq(along=DV)                 # indices to permute
idxA <- combn(idx, length(x1))        # all possibilities for different groups

# function to calculate difference in group means given index vector for group A
getDiffM <- function(x) { mean(DV[x]) - mean(DV[!(idx %in% x)]) }
resDM    <- apply(idxA, 2, getDiffM)  # difference in means for all permutations
diffM    <- mean(x1) - mean(y1)       # empirical differencen in group means

# p-value: proportion of group means at least as extreme as observed one
(pVal <- sum(resDM >= diffM) / length(resDM))
[1] 0.0945907

coin and exactRankTests are both from the same author, but coin seems to be more general and extensive - also in terms of documentation. exactRankTests is not actively developed anymore. I'd therefore choose coin (also because of informative functions like support()), unless you don't like to deal with S4 objects.

EDIT: for two dependent variables, the syntax is

id <- factor(rep(1:length(x1), 2))    # factor for participant
pvalue(oneway_test(DV ~ IV | id, alternative="greater",
                   distribution=approximate(B=9999)))
[1] 0.00810081

Related Solutions

Paired Permutation Test – Using Paired Permutation Test for Repeated Measures and Dyadic Data

There is the possibility of using the coin package for this type of stuff. See its webpage and the accepted answer to this question.

An implementation for this type of stuff would be the following.

#load the package 
require(coin)

# Some toy data:
s.data <- data.frame(dyad = c("F1-M1", "F1-M2","F2-M1","F2-M2"), condition = c(rep("A", 4), rep("B", 4)), dv = runif(8,0,1)) 

# Make sure the factors are really factors!
str(s.data)

# here goes the permutation test. 
oneway_test(dv ~ condition | dyad, distribution = approximate(B=10000), data = s.data)

(Note that you need to use an high number for B in your real example)

Furthermore, you could also use a paired t.test. I don't see any reasons against it:

t.test(subset(s.data, condition == "A", "dv", drop = TRUE), subset(s.data, condition == "B", "dv", drop = TRUE), paired = TRUE)

I think there are is one import thing you haven't discussed so far: With this implementation you would have controlled for the dyads, but not for the individuals (e.g., F1, independent of her male interaction partner). This is a serious threat to your analysis.
I know that there is a bunch of stuff an dyad analyses in psychology and related fields. Unfortunately I cannot give you a real pointer. But you should definitely check this stuff out before finishing your analyses. A quick search on rseek.org returns at least a package called dyad and this webpage.

Permutation Test – Randomisation/Permutation Test for Paired Vectors in R

Though I pointed in comments to the use of the coin package I think it's worth illustrating that a permutation/randomization test is really quite simple, so I have done it.

Here I write some R code to do a randomization test for a one sample test of location. The test randomly flips signs on the differences and computes the mean; this is equivalent to randomly assigning each pair of values to the x and y groups. The code below could be made significantly shorter (I could do it in two lines easily enough, or even one if you didn't mind slower code).

This code takes a few seconds on my machine:

# assumes the two samples are in 'x' and 'y' and x[i] and y[i] are paired
# set up:
B <- 99999
d <- x-y
m0 <- mean(d)

# perform a one-sample randomization test on d
# for the null hypothesis H0: mu_d = 0   vs H1 mu_d != 0  (i.e. two tailed)
# here the test statistic is the mean
rndmdist <- replicate(B,mean((rbinom(length(d),1,.5)*2-1)*d))

# two tailed p-value:
sum( abs(rndmdist) >= abs(m0))/length(rndmdist)

That's the whole thing.

Note that rbinom(length(d),1,.5)*2-1) gives a random -1 or 1 ... i.e. a random sign, so when we multiply by any set of signed d, it is equivalent to randomly assigning + or - signs to the absolute differences. [It doesn't matter what distribution of signs on d you start with, now the d will have random signs.]

Here, I compare it with a t-test on some made up data:

 set.seed(seed=438978)
 z=rnorm(50,10,2)
 x=z-rnorm(50,0,.5)
 y=z+.4+rnorm(50,0,.5)
 t.test(y-x) # gives p = 0.003156

 B <- 99999
 d <- x-y
 m0 <- mean(d)
 rndmdist <- replicate(B,mean((rbinom(length(d),1,.5)*2-1)*d))
 sum( abs(rndmdist) >= abs(m0))/length(rndmdist)

When the t-test is valid it usually gives a very similar p-value to the completely enumerated permutation test, and a simulated p-value as above (when the number of simulations is sufficiently large) will converge to that second p-value.

At the number of replications used above, a true permutation p-value (i.e. from complete enumeration) of 0.05 will be estimated to within 0.001 (that is, will give a randomization p-value between 0.049 and 0.051) about 85% of the time and to within 0.002 over 99.5% of the time.

Best Answer

Related Solutions

Paired Permutation Test – Using Paired Permutation Test for Repeated Measures and Dyadic Data

Permutation Test – Randomisation/Permutation Test for Paired Vectors in R

Related Question