Solved – Resampling, binomial, z- and t-test: help with real data

hypothesis testingr

I am trying to understand how I can use resampling techniques to compliment my pre-planned analyses. This is not homework. I have a 5 sided die. 30 subjects call a number (1-5) and then roll the die. If it matches it's a hit, if not it's a miss. Each subject does this 25 times.

N is the the number of trials (=25) and p is the probability of getting it correct (=.2) then the population value (mu) of the mean number correct is n*p=5. The population standard deviation, sigma, is square-root(n*p*[1-p]), which is 2.

The experimental hypothesis (H1) is that subjects in this study will score above chance (above mu). The null hypothesis (H0) assumes a binomial distribution for each subject (they will score at mu).

[Please don't get too worried about why I am doing this. If it helps you to understand the problem then you can think of it as an ESP test (and therefore I am testing the ability of subjects to score above mu). Also if it helps, imagine that the task is a virtual reality die throwing task, where the virtual 5-sided die performs according to chance. There can be no bias from an imperfect die because the die is virtual.]

Okay. So before I conducted the "experiment" I had planned to compare the 30 subjects score with a one-sample t-test (comparing it to the null that mu=5). Then I discovered that the one-sample z-test was a more powerful test given what we know about the null hypothesis. Okay.

Here is a simulation of my data in R:

binom.samp1 <- as.data.frame(matrix(rbinom(30*1, size=25, prob=0.2), ncol=1))

Now R has a binom.test function, which gives the p-value regarding the number of successes over the number of trials. For my collected data (not the simulated data given):

>binom.test(174, 750, 1/5, alternative="g")
number of successes = 174, number of trials = 750, p-value = 0.01722

Now the one-sample t-test that I had originally planned to use (mainly because I'd never heard of the alternatives – should've paid more attention in higher statistics):

>t.test(binom.samp1-5, alternative="g")
t = 1.7647, df = 29, p-value = 0.04407

and for completedness sake: the one-sample z-test (BSDA package):

>z.test(binom.samp1, mu=5, sigma.x=2, alternative="g")
z = 2.1909, p-value = 0.01423

So. My first question is, am I right in concluding that the binom.test is the correct test given the data and hypothesis? In other words, does t approximate to z which approximates to the exact binom.test (Bernoulli trial)?

Now my second question relates to the resampling methods. I have several books by Philip Good and I've read plenty on permutation and bootstrapping. I was just going to use the one-sample permutation test given in the DAAG package:

>onet.permutation(binom.samp1-5)
0.114

And the perm.test function in the exactRankTests package gives this:

>perm.test(binom.samp1, mu=5, alternative="g", exact=TRUE)
T = 42, p-value = 0.05113

I have the feeling that what I want to do is conduct a one-sample permutation binom.test. The only way I can see it working is if I take a subset of the 30 subjects and calculate the binom.test and then repeat it for a large number of N. Does this sound like a reasonable idea?

Finally, I did repeat this experiment with the same equipment (the 5 sided die) but a larger sample size (50 people) and I got exactly what I expected. My understanding is that the two studies are like a Galton box that hasn't filled up yet. The 30 n experiment has a bit of a skew, but had it been run for longer it would have filled up to the binomial. Is this all gibberish?

>binom.test(231, 1250, 1/5, alternative="g")
number of successes = 231, number of trials = 1250, p-value = 0.917

>t.test(binom.samp2-5)
t = -1.2249, df = 49, p-value = 0.2265

>z.test(binom.samp2, mu=5, sigma.x=2)
z = -1.3435, p-value = 0.1791

>onet.permutation(binom.samp2-5)
0.237

>perm.test(binom.samp2, mu=5, alternative="g", exact=TRUE)
T = 35, p-value = 0.8991

Best Answer

Answer #1: binom.test is in some ways a "more correct" test because it doesn't assume normality; yes - you'll get more power out of the normality assumption, and it might be reasonable - but to any extent you violate the assumptions of the test you may increase your type-I error rate.

Explanation #1: Though with a high number of trials results from a binomial data source approaches normality, it isn't perfectly normal. To convince yourself about this you can use a Shapiro-Wilk test for normality, e.g. shapiro.test(rbinom(30,25,.2)) [where 30 is your number of participants, 25 is your number of trials, and .2 is the underlying probability of success]. You'll note with random data sometimes normality is significantly violated and sometimes it isn't. Your own data will tell the story you need to know. But, in general, because it is possible to violate normality under these circumstances, I prefer to avoid making the assumption.

Answer #2: See my answer elsewhere. What you are proposing sounds like a bootstrap of permutation test results. Don't do that; it is odd and you won't be able to publish it. The binom.test is sufficient for your data and hypothesis. I'd suggest that you don't confuse matters by doing a permutation test or parametric test where the binomial distribution is clearly the best fit for the process generating your data. Also, it is confusing that in one case you'd be willing to make assumptions (e.g. normality) but elsewhere and to use a permutation test. The strength of permutation tests is that they don't tend to make as many assumptions.

Answer #3: It isn't gibberish. You might want to consider breaking your questions down in the future. It is a bit much for a single question here. In short, standard statistical approaches can lead to a failure to replicate in the way you describe because either 1) the results from experiment 1 were due to a Type I error or 2) the results from experiment 2 were due to a Type II error. Does N = 50 provide enough power that you can be confident in the results?

Related Solutions

Permutation Test – Assistance with One-Sample Permutation Z-Test

I don't think it is going a one sample Z; to me it looks like it is a test against a certain set of priors.

I'm confused, why are you doing a one sample Z using binomial data as your source data? You could simply create a distribution of N successes and see what quantile your actual data was in. However the above method doesn't look like a permutation test per-se to me; as your code doesn't actually permute the observed values between 1 and 0.

That being said, let me comment on your code - to me it looks as though z is defined as

zeta = (mean(mn) - mu) / (sqrt(var/nx))
z[i] <- zeta

Thus, each score in Z is like a Z score of the randomly created binomial vector using the priors you've selected as the null hypothesis. But then you compare that Z to abs(mx); where mx is defined as the mean of your observed binomial vector. At the very least this looks like a problem to me. Z scores should be either compared to some other Z score or means should be compared to means.

As I alluded to above, it is odd that you'd put all of this under a structure of a Z-test. The Z score is nominally a linear transform of the differences between means, as such the result of a test like this should be the same whether you use a Z score or simply look at the differences between means.

Moreover, what you are doing seems a like an attempt to test the observed value against some priors rather than an actual one sample permutation test. What you want to test against is something like permbinom (code provided below) where it could be the case for each observed value that it either was a success or it was not a success. This is in-line with Tukey's classic example with the lady who claimed she could tell whether tea or milk was added first. Critically different from your test, the assumption of this permutation test is that the null hypothesis is fixed at p = .5.

permbinom <- function(x)
{
  newx <- x
    nx <- length(x) 
    change <- rbinom(n=nx,size=1,prob=.5) 
    #This code is readable but inefficient
    #Swap the values between 1 and 0 if change == 1
    for (i in 1:nx)
    {
        if ((change[i] == 1) & (x[i] == 1)) {newx[i] <- 0} 
        if ((change[i] == 1) & (x[i] == 0)) {newx[i] <- 1} 
    }
  return(newx)
}
permtest <- function(x,nsim=2000)
{
   permref <- rep(NA,nsim)
   obsn <- sum(x)
   for (i in 1:nsim)
     {
      permref[i] <- sum(permbinom(x))
     }
     pval <- min(c(
              (sum(permref > obsn)*2/nsim),
              (sum(permref < obsn)*2/nsim)
                ))
  return(pval)
}

I'm not 100% confident regarding how I'm calculating the p-value here; so if someone would kindly correct me if I'm doing it wrong I'll incorporate that as an edit.

For reference, here is a faster permutation function for one-sample tests of binomial data.

permbinomf <- function(x)
{
  newx <- x
    nx <- length(x) 
    change <- rbinom(n=nx,size=1,prob=.5) 
    #This code is readable but inefficient
    #Swap the values between 1 and 0 if change == 1
newx <- x + change
newx <- ifelse(newx==2,0,newx)
  return(newx)
}

Edit: The question is also put forth, "What about an arbitrary subset with a one-sample z-test?". That would also work, assuming you had a large enough sample to subset. However, it would not be a permutation test, it would be more akin to a bootstrap.

Edit 2: Perhaps the most important answer to your question, is this: You are doing something acceptable (if you fix the Z vs mean computational error noted above), but you aren't doing what you think you are doing. You are comparing your results to results where the null hypothesis is true. This is essentially a Monte-Carlo simulation and if you correct the math (and I suggest you also simplify it) it is an acceptable technique for testing your hypothesis. Also note, my answer above is for a two-tailed test. As noted in the other question, you are ignoring the nesting of binomial observations under participants but independence isn't an assumption in a permutation or monte-carlo test so you should be more or less fine. Though, as also noted there you ignore the possibility that some people are doing better than chance and others are performing at chance.

Solved – Comparing p-values for Fisher’s exact test and test of equal proportions

prop.test uses a Pearson chi-square test. This is an asymptotic test. It will be worst when you have small samples or get too near the tails. Fishers will always be "better" because it is an "exact" test that does not rely upon asymptotic arguments to obtain its p-values...rather, it computes all the ways the table could have come about and then finds the proportion that were as-or-more-extreme.

Practically, this will result in Fisher's being less "powerful" when it matters because Pearson's approximation is most wrong in exactly those cases.

I do not know why fisher.test should take so long. For sample sizes on the order of $10^7$, it should have dropped to approximate methods unless the events are really rare. Are they? An alternative might be binom.test which uses Fisher's and may swap algorithms when sample sizes get large and event rates are still common. That might speed things up. A MonteCarlo version might work, also.

In your case and for sample sizes this high and non-rare events, Fisher's and Pearson's should not disagree to any real extent but I'd request the continuity-correction on Pearson prop.test(..., correct=TRUE). Try your simulation with this option and see if there is a dime's worth of difference then.

Another option is Barnard's unconditional test which can be more powerful but which many people frown at (even Barnard) though their cited reasons are often esoteric. In any case, that is not likely to be faster than either Pearson or Fisher.

Best Answer

Related Solutions

Permutation Test – Assistance with One-Sample Permutation Z-Test

Solved – Comparing p-values for Fisher’s exact test and test of equal proportions

Related Question