Permutation Test – Comparing a Single Sample Against a Mean Using Permutation Test

permutation-testt-test

When people implement permutation tests to compare a single sample against a mean (e.g., as you might do with a permutation t-test), how is the mean handled? I have seen implementations that take a mean and a sample for a permutation test, but it is unclear what they're actually doing under the hood. Is there even a meaningful way to do a permutation test (e.g., t-test) for one sample versus an assumed mean? Or, alternatively, are they just defaulting to a non-permutation test under the hood? (e.g., despite calling a permutation function or setting a permutation test flag, defaulting to a standard t-test or similar function)

In a standard two-sample permutation test, one would have two groups and randomize the assignment of labels. However, how is this handled when one "group" is an assumed mean? Obviously, an assumed mean has no sample size in and of itself. So then, what is the typical way of working the mean into a permutation format? Is the "mean" sample assumed to be a single point? A sample of equal size to the sample group? An infinitely-sized sample?

Given that an assumed mean is, well, assumed- I'd say it technically has either infinite support or whatever support you want to assume for it. However, neither of these are very useful for an actual calculation. An equal-sized sample with values all equal to the mean seems to be what is done sometimes with some tests (e.g., you just fill in the other half of the pairs with the assumed location). This makes a bit of sense, as it's the equal-length sample you'd see if your assumed mean was correct with no variance.

So my question is this: In practice, do people actually emulate permutation-test style label randomization when the second set is a mean (or similar abstract assumed value)? If so, how do people handle label randomization when they do this?

Best Answer

Expanding Glen_b's comment into an answer

An approximate one-sample permutation test for the mean of a sample, against a null hypothesis of zero mean, is implemented by assigning random signs to the data in the sample. Non-zero null hypotheses can be tested by subtracting the desired null mean from the data.

This is easy to see in the source of the R function onetPermutation in the package DAAG. Here's an excerpt of the relevant code, with comments I've added:

function (x, nsim) {

  ## Initialize and pre-allocate

  n <- length(x)
  dbar <- mean(x)
  absx <- abs(x)  # there's actually a bug in the code; below you'll see that the function ends up re-computing abs(x) instead of using this
  z <- array(, nsim)


  ## Run the simulation    

  for (i in 1:nsim) {                             # Do nsim times:
      mn <- sample(c(-1, 1), n, replace = TRUE)   #  1. take n random draws from {-1, 1}, where n is the length of the data to be tested
      xbardash <- mean(mn * abs(x))               #  2. assign the signs to the data and put them in a temporary variable
      z[i] <- xbardash                            #  3. save the new data in an array
  }


  ## Return the p value
  # p = the fraction of fake data that is:
  #      larger than |sample mean of x|, or
  #    smaller than -|sample mean of x|

  (sum(z >= abs(dbar)) + sum(z <= -abs(dbar)))/nsim
}
Related Question