Permutation Test – Assistance with One-Sample Permutation Z-Test

permutation-testr

This is related to another question I asked recently. To recap:

[I had 30 people call a number and then roll a 5 sided die. If the call matches the subsequent face then the trial is a hit, else it is a miss. Each subject completes 25 trials (rolls) and thus, each participant has a score out of 25. Since the die is a virtual one, it cannot be biased. Before the experiment was conducted I was going to compare the subjects score with a one-sample t-test (compared to mu of 5). However I was pointed towards the more powerful z-test, which is appropriate because we know the population parameters for the null hypothesis: that everyone should score at chance. Since NPQ means the binomial approximates to the normal or Gaussian distribution, we can use a parametric test. So I could just forget about it all and go back to the t-test I planned to use, but it seems to me that although the z-test is not often used in real research it is appropriate here. That was the conclusion from my previous question. Now I am trying to understand how to use resampling methods (either permutation or bootstrap) to compliment my parametric analysis.]

Okay. I am trying to program a one-sample permutation z-test, using the DAAG package onet.permutation as inspiration. This is as far as I've got:

perm.z.test = function(x, mu, var, n, prob, nsim){
  nx <- length(x)
  mx <- mean(x)
  z <- array(, nsim)
  for (i in 1:nsim) {
    mn <- rbinom(nx*1, size=n, prob=prob)
    zeta = (mean(mn) - mu) / (sqrt(var/nx))
    z[i] <- zeta
  }
  pval <- (sum(z >= abs(mx)) + sum(z <= -abs(mx)))/nsim
  print(signif(pval, 3))
}

Where: x is the variable to test, n is the the number of trials (=25) and prob is the probability of getting it correct (=.2). The population value (mu) of the mean number correct is np. The population standard deviation, var, is square-root(np*[1-p]).

Now I guess this compares x to an array composed of randomly generated binomial sample. If I centre x at 0 (variable-mu) I get a p-value. Can somebody confirm that it is doing what I think it is doing?

My testing gives this:

> binom.samp1 <- as.data.frame(matrix(rbinom(30*1, size=25, prob=0.2), 
                                     ncol=1))
> z.test(binom.samp1$V1, mu=5, sigma.x=2)
data:  binom.samp1$V1 
z = 0.7303, p-value = 0.4652

> perm.z.test(binom.samp1$V1-5, 5, 2, 25, .2, 2000)
[1] 0.892

> binom.samp1 <- as.data.frame(matrix(rbinom(1000*1, size=25, prob=0.2),
                                     ncol=1))
> perm.z.test(binom.samp1$V1-5, 5, 2, 25, .2, 2000)
[1] 0.937

Does this look right?

UPDATE:

Since this obviously doesn't do what I want, I do have another angle. This website offers this advice:

There is no reason whatsoever why a
permutation test has to use any
particular test statistic. Any test
statistic will do! … For one-sample
or paired two-sample tests, in
particular, for Wilcoxon signed rank
tests, the permutations are really
subsets. The permutation distribution
choses an arbitrary subset to mark +
and the complementary subset is marked
-. Either subset can be empty.

What about an arbitrary subset with a one-sample z-test?

Best Answer

I don't think it is going a one sample Z; to me it looks like it is a test against a certain set of priors.

I'm confused, why are you doing a one sample Z using binomial data as your source data? You could simply create a distribution of N successes and see what quantile your actual data was in. However the above method doesn't look like a permutation test per-se to me; as your code doesn't actually permute the observed values between 1 and 0.

That being said, let me comment on your code - to me it looks as though z is defined as

zeta = (mean(mn) - mu) / (sqrt(var/nx))
z[i] <- zeta

Thus, each score in Z is like a Z score of the randomly created binomial vector using the priors you've selected as the null hypothesis. But then you compare that Z to abs(mx); where mx is defined as the mean of your observed binomial vector. At the very least this looks like a problem to me. Z scores should be either compared to some other Z score or means should be compared to means.

As I alluded to above, it is odd that you'd put all of this under a structure of a Z-test. The Z score is nominally a linear transform of the differences between means, as such the result of a test like this should be the same whether you use a Z score or simply look at the differences between means.

Moreover, what you are doing seems a like an attempt to test the observed value against some priors rather than an actual one sample permutation test. What you want to test against is something like permbinom (code provided below) where it could be the case for each observed value that it either was a success or it was not a success. This is in-line with Tukey's classic example with the lady who claimed she could tell whether tea or milk was added first. Critically different from your test, the assumption of this permutation test is that the null hypothesis is fixed at p = .5.

permbinom <- function(x)
{
  newx <- x
    nx <- length(x) 
    change <- rbinom(n=nx,size=1,prob=.5) 
    #This code is readable but inefficient
    #Swap the values between 1 and 0 if change == 1
    for (i in 1:nx)
    {
        if ((change[i] == 1) & (x[i] == 1)) {newx[i] <- 0} 
        if ((change[i] == 1) & (x[i] == 0)) {newx[i] <- 1} 
    }
  return(newx)
}
permtest <- function(x,nsim=2000)
{
   permref <- rep(NA,nsim)
   obsn <- sum(x)
   for (i in 1:nsim)
     {
      permref[i] <- sum(permbinom(x))
     }
     pval <- min(c(
              (sum(permref > obsn)*2/nsim),
              (sum(permref < obsn)*2/nsim)
                ))
  return(pval)
}

I'm not 100% confident regarding how I'm calculating the p-value here; so if someone would kindly correct me if I'm doing it wrong I'll incorporate that as an edit.

For reference, here is a faster permutation function for one-sample tests of binomial data.

permbinomf <- function(x)
{
  newx <- x
    nx <- length(x) 
    change <- rbinom(n=nx,size=1,prob=.5) 
    #This code is readable but inefficient
    #Swap the values between 1 and 0 if change == 1
newx <- x + change
newx <- ifelse(newx==2,0,newx)
  return(newx)
}

Edit: The question is also put forth, "What about an arbitrary subset with a one-sample z-test?". That would also work, assuming you had a large enough sample to subset. However, it would not be a permutation test, it would be more akin to a bootstrap.

Edit 2: Perhaps the most important answer to your question, is this: You are doing something acceptable (if you fix the Z vs mean computational error noted above), but you aren't doing what you think you are doing. You are comparing your results to results where the null hypothesis is true. This is essentially a Monte-Carlo simulation and if you correct the math (and I suggest you also simplify it) it is an acceptable technique for testing your hypothesis. Also note, my answer above is for a two-tailed test. As noted in the other question, you are ignoring the nesting of binomial observations under participants but independence isn't an assumption in a permutation or monte-carlo test so you should be more or less fine. Though, as also noted there you ignore the possibility that some people are doing better than chance and others are performing at chance.

Related Solutions

Solved – Resampling, binomial, z- and t-test: help with real data

Answer #1: binom.test is in some ways a "more correct" test because it doesn't assume normality; yes - you'll get more power out of the normality assumption, and it might be reasonable - but to any extent you violate the assumptions of the test you may increase your type-I error rate.

Explanation #1: Though with a high number of trials results from a binomial data source approaches normality, it isn't perfectly normal. To convince yourself about this you can use a Shapiro-Wilk test for normality, e.g. shapiro.test(rbinom(30,25,.2)) [where 30 is your number of participants, 25 is your number of trials, and .2 is the underlying probability of success]. You'll note with random data sometimes normality is significantly violated and sometimes it isn't. Your own data will tell the story you need to know. But, in general, because it is possible to violate normality under these circumstances, I prefer to avoid making the assumption.

Answer #2: See my answer elsewhere. What you are proposing sounds like a bootstrap of permutation test results. Don't do that; it is odd and you won't be able to publish it. The binom.test is sufficient for your data and hypothesis. I'd suggest that you don't confuse matters by doing a permutation test or parametric test where the binomial distribution is clearly the best fit for the process generating your data. Also, it is confusing that in one case you'd be willing to make assumptions (e.g. normality) but elsewhere and to use a permutation test. The strength of permutation tests is that they don't tend to make as many assumptions.

Answer #3: It isn't gibberish. You might want to consider breaking your questions down in the future. It is a bit much for a single question here. In short, standard statistical approaches can lead to a failure to replicate in the way you describe because either 1) the results from experiment 1 were due to a Type I error or 2) the results from experiment 2 were due to a Type II error. Does N = 50 provide enough power that you can be confident in the results?

Solved – Doubling the tails in two-sample permutation test

The permutation distribution of your test statistic is not guaranteed to be symmetric, so you can't do it that way. Instead, you add both tails. In your case of two independent samples, the null hypothesis is that the two location parameters are equal. Assuming continuous distributions and equal spread in both groups, we have exchangeability under the null hypothesis. Test statistic $T$ is the difference in means, with $E(T) = 0$ under the null.

The value for $T$ in the original sample is $T_{\text{emp}}$, and its values for the permutations $T^{\star}$. $\sharp(\cdot)$ is short for "number of" something, e.g., $\sharp(T^{\star})$ is the number of permutation test statistics. Then the $p$-value for the two-sided hypothesis is $p_{\text{ts}} = p_{\text{left}} + p_{\text{right}}$, where

$p_{\text{left}} = \frac{\sharp(T^{\star} \, <= \, \text{min}(T_{\text{emp}}, -T_{\text{emp}}))}{\sharp(T^{\star})}$

$p_{\text{right}} = \frac{\sharp(T^{\star} \, >= \, \text{max}(T_{\text{emp}}, -T_{\text{emp}}))}{\sharp(T^{\star})}$

(assuming we have the complete permutation distribution). Let's compare both approaches for the case of two independent samples when we can calculate the exact (complete) permutation distribution.

set.seed(1234)
Nj   <- c(9, 8)                      # group sizes
DVa  <- rnorm(Nj[1], 5, 20)^2        # data group 1
DVb  <- rnorm(Nj[2], 10, 20)^2       # data group 2
DVab <- c(DVa, DVb)                  # data from both groups
IV   <- factor(rep(c("A", "B"), Nj)) # grouping factor
idx  <- seq(along=DVab)              # all indices
idxA <- combn(idx, Nj[1])            # all possible first groups

# function to calculate test statistic for a given permutation x
getDM <- function(x) { mean(DVab[x]) - mean(DVab[!(idx %in% x)]) }
resDM <- apply(idxA, 2, getDM)       # test statistic for all permutations
diffM <- mean(DVa) - mean(DVb)       # empirical stest statistic

Now calculate the $p$-values and validate the proposed solution with the implementation in R's coin package. Observe that $p_{\text{left}} \neq p_{\text{right}}$, so it matters which way you calculate $p_{ts}$.

> (pL <- sum(resDM <= min(diffM, -diffM)) / length(resDM))  # left p-value
[1] 0.1755245

> (pR <- sum(resDM >= max(diffM, -diffM)) / length(resDM))  # right p-value
[1] 0.1585356

> 2*pL        # doubling left p-value
[1] 0.351049

> 2*pR        # doubling right p-value
[1] 0.3170712

> pL+pR       # two-sided p-value
[1] 0.3340601

> sum(abs(resDM) >= abs(diffM)) / length(resDM)  # two-sided p-value (more concise)
[1] 0.3340601

# validate with coin implementation
> library(coin)              # for oneway_test()    
> oneway_test(DVab ~ IV, alternative="two.sided", distribution="exact")
Exact 2-Sample Permutation Test
data:  DVab by IV (A, B) 
Z = 1.0551, p-value = 0.3341
alternative hypothesis: true mu is not equal to 0

P.S. For the Monte-Carlo case where we only sample from the permutation distribution, the $p$-values would be defined like this:

$p_{\text{left}} = \frac{\sharp(T^{\star} \, <= \, \text{min}(T_{\text{emp}}, -T_{\text{emp}})) + 1}{\sharp(T^{\star}) \, + \, 1}$

$p_{\text{right}} = \frac{\sharp(T^{\star} \, >= \, \text{max}(T_{\text{emp}}, -T_{\text{emp}})) +1 }{\sharp(T^{\star}) \, + \, 1}$

$p_{\text{ts}} = \frac{\sharp(\text{abs}(T^{\star}) \, >= \, \text{abs}(T_{\text{emp}})) \, + \, 1 }{\sharp(T^{\star}) + 1}$

The reason for adding one more extreme permutation case intuitively is that we need to count the empirical sample as well. Otherwise, the permutation $p$-value could be 0 which cannot happen in the continuous case (see here, note: some texts recommend this correction, some don't).

Best Answer

Related Solutions

Solved – Resampling, binomial, z- and t-test: help with real data

Solved – Doubling the tails in two-sample permutation test

Related Question