I don't think it is going a one sample Z; to me it looks like it is a test against a certain set of priors.
I'm confused, why are you doing a one sample Z using binomial data as your source data? You could simply create a distribution of N successes and see what quantile your actual data was in. However the above method doesn't look like a permutation test per-se to me; as your code doesn't actually permute the observed values between 1 and 0.
That being said, let me comment on your code - to me it looks as though z is defined as
zeta = (mean(mn) - mu) / (sqrt(var/nx))
z[i] <- zeta
Thus, each score in Z is like a Z score of the randomly created binomial vector using the priors you've selected as the null hypothesis. But then you compare that Z to abs(mx); where mx is defined as the mean of your observed binomial vector. At the very least this looks like a problem to me. Z scores should be either compared to some other Z score or means should be compared to means.
As I alluded to above, it is odd that you'd put all of this under a structure of a Z-test. The Z score is nominally a linear transform of the differences between means, as such the result of a test like this should be the same whether you use a Z score or simply look at the differences between means.
Moreover, what you are doing seems a like an attempt to test the observed value against some priors rather than an actual one sample permutation test. What you want to test against is something like permbinom (code provided below) where it could be the case for each observed value that it either was a success or it was not a success. This is in-line with Tukey's classic example with the lady who claimed she could tell whether tea or milk was added first. Critically different from your test, the assumption of this permutation test is that the null hypothesis is fixed at p = .5.
permbinom <- function(x)
{
newx <- x
nx <- length(x)
change <- rbinom(n=nx,size=1,prob=.5)
#This code is readable but inefficient
#Swap the values between 1 and 0 if change == 1
for (i in 1:nx)
{
if ((change[i] == 1) & (x[i] == 1)) {newx[i] <- 0}
if ((change[i] == 1) & (x[i] == 0)) {newx[i] <- 1}
}
return(newx)
}
permtest <- function(x,nsim=2000)
{
permref <- rep(NA,nsim)
obsn <- sum(x)
for (i in 1:nsim)
{
permref[i] <- sum(permbinom(x))
}
pval <- min(c(
(sum(permref > obsn)*2/nsim),
(sum(permref < obsn)*2/nsim)
))
return(pval)
}
I'm not 100% confident regarding how I'm calculating the p-value here; so if someone would kindly correct me if I'm doing it wrong I'll incorporate that as an edit.
For reference, here is a faster permutation function for one-sample tests of binomial data.
permbinomf <- function(x)
{
newx <- x
nx <- length(x)
change <- rbinom(n=nx,size=1,prob=.5)
#This code is readable but inefficient
#Swap the values between 1 and 0 if change == 1
newx <- x + change
newx <- ifelse(newx==2,0,newx)
return(newx)
}
Edit: The question is also put forth, "What about an arbitrary subset with a one-sample z-test?". That would also work, assuming you had a large enough sample to subset. However, it would not be a permutation test, it would be more akin to a bootstrap.
Edit 2: Perhaps the most important answer to your question, is this: You are doing something acceptable (if you fix the Z vs mean computational error noted above), but you aren't doing what you think you are doing. You are comparing your results to results where the null hypothesis is true. This is essentially a Monte-Carlo simulation and if you correct the math (and I suggest you also simplify it) it is an acceptable technique for testing your hypothesis. Also note, my answer above is for a two-tailed test. As noted in the other question, you are ignoring the nesting of binomial observations under participants but independence isn't an assumption in a permutation or monte-carlo test so you should be more or less fine. Though, as also noted there you ignore the possibility that some people are doing better than chance and others are performing at chance.
Why not simulate some data that is the same structure as your data, but without any correlation/relationships, then use that (probably do this multiple times) to see how your strategy behaves. If you use the permutation test for each of the 93 variables then you will still have multiple comparison issues and are still likely to declare 4-5 correlations as significant when they really are not due to chance.
To correct for multiple comparisons you would need to do something more along the lines of combining all your correlation measures (probably transformed to be on some similar scale) and comparing the combined measure to the permutation values. Combinations to consider would be the maximum correlation (absolute value) or the mean correlation.
Something more along the lines of the FDR would be to compare your strongest correlation to the strongest from the permuted distribution, then compare the 2nd strongest to the 2nd strongest from the permutations, etc.
Having a mixture of different correlation measures will complicate this, but you could either analyze the groups separately, or convert everything to be on a similar scale (p-value would be one) so that they are comparable.
Best Answer
Though I pointed in comments to the use of the
coin
package I think it's worth illustrating that a permutation/randomization test is really quite simple, so I have done it.Here I write some R code to do a randomization test for a one sample test of location. The test randomly flips signs on the differences and computes the mean; this is equivalent to randomly assigning each pair of values to the x and y groups. The code below could be made significantly shorter (I could do it in two lines easily enough, or even one if you didn't mind slower code).
This code takes a few seconds on my machine:
That's the whole thing.
Note that
rbinom(length(d),1,.5)*2-1)
gives a random-1
or1
... i.e. a random sign, so when we multiply by any set of signedd
, it is equivalent to randomly assigning+
or-
signs to the absolute differences. [It doesn't matter what distribution of signs ond
you start with, now thed
will have random signs.]Here, I compare it with a t-test on some made up data:
When the t-test is valid it usually gives a very similar p-value to the completely enumerated permutation test, and a simulated p-value as above (when the number of simulations is sufficiently large) will converge to that second p-value.
At the number of replications used above, a true permutation p-value (i.e. from complete enumeration) of 0.05 will be estimated to within 0.001 (that is, will give a randomization p-value between 0.049 and 0.051) about 85% of the time and to within 0.002 over 99.5% of the time.