Permutation Test – Comparing a Single Sample Against a Mean Using Permutation Test

permutation-testt-test

When people implement permutation tests to compare a single sample against a mean (e.g., as you might do with a permutation t-test), how is the mean handled? I have seen implementations that take a mean and a sample for a permutation test, but it is unclear what they're actually doing under the hood. Is there even a meaningful way to do a permutation test (e.g., t-test) for one sample versus an assumed mean? Or, alternatively, are they just defaulting to a non-permutation test under the hood? (e.g., despite calling a permutation function or setting a permutation test flag, defaulting to a standard t-test or similar function)

In a standard two-sample permutation test, one would have two groups and randomize the assignment of labels. However, how is this handled when one "group" is an assumed mean? Obviously, an assumed mean has no sample size in and of itself. So then, what is the typical way of working the mean into a permutation format? Is the "mean" sample assumed to be a single point? A sample of equal size to the sample group? An infinitely-sized sample?

Given that an assumed mean is, well, assumed- I'd say it technically has either infinite support or whatever support you want to assume for it. However, neither of these are very useful for an actual calculation. An equal-sized sample with values all equal to the mean seems to be what is done sometimes with some tests (e.g., you just fill in the other half of the pairs with the assumed location). This makes a bit of sense, as it's the equal-length sample you'd see if your assumed mean was correct with no variance.

So my question is this: In practice, do people actually emulate permutation-test style label randomization when the second set is a mean (or similar abstract assumed value)? If so, how do people handle label randomization when they do this?

Best Answer

Expanding Glen_b's comment into an answer

An approximate one-sample permutation test for the mean of a sample, against a null hypothesis of zero mean, is implemented by assigning random signs to the data in the sample. Non-zero null hypotheses can be tested by subtracting the desired null mean from the data.

This is easy to see in the source of the R function onetPermutation in the package DAAG. Here's an excerpt of the relevant code, with comments I've added:

function (x, nsim) {

  ## Initialize and pre-allocate

  n <- length(x)
  dbar <- mean(x)
  absx <- abs(x)  # there's actually a bug in the code; below you'll see that the function ends up re-computing abs(x) instead of using this
  z <- array(, nsim)


  ## Run the simulation    

  for (i in 1:nsim) {                             # Do nsim times:
      mn <- sample(c(-1, 1), n, replace = TRUE)   #  1. take n random draws from {-1, 1}, where n is the length of the data to be tested
      xbardash <- mean(mn * abs(x))               #  2. assign the signs to the data and put them in a temporary variable
      z[i] <- xbardash                            #  3. save the new data in an array
  }


  ## Return the p value
  # p = the fraction of fake data that is:
  #      larger than |sample mean of x|, or
  #    smaller than -|sample mean of x|

  (sum(z >= abs(dbar)) + sum(z <= -abs(dbar)))/nsim
}

Related Solutions

Permutation Test – Randomisation/Permutation Test for Paired Vectors in R

Though I pointed in comments to the use of the coin package I think it's worth illustrating that a permutation/randomization test is really quite simple, so I have done it.

Here I write some R code to do a randomization test for a one sample test of location. The test randomly flips signs on the differences and computes the mean; this is equivalent to randomly assigning each pair of values to the x and y groups. The code below could be made significantly shorter (I could do it in two lines easily enough, or even one if you didn't mind slower code).

This code takes a few seconds on my machine:

# assumes the two samples are in 'x' and 'y' and x[i] and y[i] are paired
# set up:
B <- 99999
d <- x-y
m0 <- mean(d)

# perform a one-sample randomization test on d
# for the null hypothesis H0: mu_d = 0   vs H1 mu_d != 0  (i.e. two tailed)
# here the test statistic is the mean
rndmdist <- replicate(B,mean((rbinom(length(d),1,.5)*2-1)*d))

# two tailed p-value:
sum( abs(rndmdist) >= abs(m0))/length(rndmdist)

That's the whole thing.

Note that rbinom(length(d),1,.5)*2-1) gives a random -1 or 1 ... i.e. a random sign, so when we multiply by any set of signed d, it is equivalent to randomly assigning + or - signs to the absolute differences. [It doesn't matter what distribution of signs on d you start with, now the d will have random signs.]

Here, I compare it with a t-test on some made up data:

 set.seed(seed=438978)
 z=rnorm(50,10,2)
 x=z-rnorm(50,0,.5)
 y=z+.4+rnorm(50,0,.5)
 t.test(y-x) # gives p = 0.003156

 B <- 99999
 d <- x-y
 m0 <- mean(d)
 rndmdist <- replicate(B,mean((rbinom(length(d),1,.5)*2-1)*d))
 sum( abs(rndmdist) >= abs(m0))/length(rndmdist)

When the t-test is valid it usually gives a very similar p-value to the completely enumerated permutation test, and a simulated p-value as above (when the number of simulations is sufficiently large) will converge to that second p-value.

At the number of replications used above, a true permutation p-value (i.e. from complete enumeration) of 0.05 will be estimated to within 0.001 (that is, will give a randomization p-value between 0.049 and 0.051) about 85% of the time and to within 0.002 over 99.5% of the time.

Best Answer

Related Solutions

Permutation Test – Randomisation/Permutation Test for Paired Vectors in R

Related Question