Solved – Resampling, binomial, z- and t-test: help with real data

hypothesis testingr

I am trying to understand how I can use resampling techniques to compliment my pre-planned analyses. This is not homework. I have a 5 sided die. 30 subjects call a number (1-5) and then roll the die. If it matches it's a hit, if not it's a miss. Each subject does this 25 times.

N is the the number of trials (=25) and p is the probability of getting it correct (=.2) then the population value (mu) of the mean number correct is n*p=5. The population standard deviation, sigma, is square-root(n*p*[1-p]), which is 2.

The experimental hypothesis (H1) is that subjects in this study will score above chance (above mu). The null hypothesis (H0) assumes a binomial distribution for each subject (they will score at mu).

[Please don't get too worried about why I am doing this. If it helps you to understand the problem then you can think of it as an ESP test (and therefore I am testing the ability of subjects to score above mu). Also if it helps, imagine that the task is a virtual reality die throwing task, where the virtual 5-sided die performs according to chance. There can be no bias from an imperfect die because the die is virtual.]

Okay. So before I conducted the "experiment" I had planned to compare the 30 subjects score with a one-sample t-test (comparing it to the null that mu=5). Then I discovered that the one-sample z-test was a more powerful test given what we know about the null hypothesis. Okay.

Here is a simulation of my data in R:

binom.samp1 <- as.data.frame(matrix(rbinom(30*1, size=25, prob=0.2), ncol=1))

Now R has a binom.test function, which gives the p-value regarding the number of successes over the number of trials. For my collected data (not the simulated data given):

>binom.test(174, 750, 1/5, alternative="g")
number of successes = 174, number of trials = 750, p-value = 0.01722

Now the one-sample t-test that I had originally planned to use (mainly because I'd never heard of the alternatives – should've paid more attention in higher statistics):

>t.test(binom.samp1-5, alternative="g")
t = 1.7647, df = 29, p-value = 0.04407

and for completedness sake: the one-sample z-test (BSDA package):

>z.test(binom.samp1, mu=5, sigma.x=2, alternative="g")
z = 2.1909, p-value = 0.01423

So. My first question is, am I right in concluding that the binom.test is the correct test given the data and hypothesis? In other words, does t approximate to z which approximates to the exact binom.test (Bernoulli trial)?

Now my second question relates to the resampling methods. I have several books by Philip Good and I've read plenty on permutation and bootstrapping. I was just going to use the one-sample permutation test given in the DAAG package:

>onet.permutation(binom.samp1-5)
0.114

And the perm.test function in the exactRankTests package gives this:

>perm.test(binom.samp1, mu=5, alternative="g", exact=TRUE)
T = 42, p-value = 0.05113

I have the feeling that what I want to do is conduct a one-sample permutation binom.test. The only way I can see it working is if I take a subset of the 30 subjects and calculate the binom.test and then repeat it for a large number of N. Does this sound like a reasonable idea?

Finally, I did repeat this experiment with the same equipment (the 5 sided die) but a larger sample size (50 people) and I got exactly what I expected. My understanding is that the two studies are like a Galton box that hasn't filled up yet. The 30 n experiment has a bit of a skew, but had it been run for longer it would have filled up to the binomial. Is this all gibberish?

>binom.test(231, 1250, 1/5, alternative="g")
number of successes = 231, number of trials = 1250, p-value = 0.917

>t.test(binom.samp2-5)
t = -1.2249, df = 49, p-value = 0.2265

>z.test(binom.samp2, mu=5, sigma.x=2)
z = -1.3435, p-value = 0.1791

>onet.permutation(binom.samp2-5)
0.237

>perm.test(binom.samp2, mu=5, alternative="g", exact=TRUE)
T = 35, p-value = 0.8991

Best Answer

Answer #1: binom.test is in some ways a "more correct" test because it doesn't assume normality; yes - you'll get more power out of the normality assumption, and it might be reasonable - but to any extent you violate the assumptions of the test you may increase your type-I error rate.

Explanation #1: Though with a high number of trials results from a binomial data source approaches normality, it isn't perfectly normal. To convince yourself about this you can use a Shapiro-Wilk test for normality, e.g. shapiro.test(rbinom(30,25,.2)) [where 30 is your number of participants, 25 is your number of trials, and .2 is the underlying probability of success]. You'll note with random data sometimes normality is significantly violated and sometimes it isn't. Your own data will tell the story you need to know. But, in general, because it is possible to violate normality under these circumstances, I prefer to avoid making the assumption.

Answer #2: See my answer elsewhere. What you are proposing sounds like a bootstrap of permutation test results. Don't do that; it is odd and you won't be able to publish it. The binom.test is sufficient for your data and hypothesis. I'd suggest that you don't confuse matters by doing a permutation test or parametric test where the binomial distribution is clearly the best fit for the process generating your data. Also, it is confusing that in one case you'd be willing to make assumptions (e.g. normality) but elsewhere and to use a permutation test. The strength of permutation tests is that they don't tend to make as many assumptions.

Answer #3: It isn't gibberish. You might want to consider breaking your questions down in the future. It is a bit much for a single question here. In short, standard statistical approaches can lead to a failure to replicate in the way you describe because either 1) the results from experiment 1 were due to a Type I error or 2) the results from experiment 2 were due to a Type II error. Does N = 50 provide enough power that you can be confident in the results?

Related Question