Solved – How many trials are needed to get a statistically important proportion of 0.003 for a binomial variable

binomial distributionhypothesis testingproportion;rstatistical significance

I want to do an experiment and try to get an initial proportion (e.g., successful rate). The experiment will generate one of two results (either success or failure). I would like to know how many trials are needed to get a successful rate of 0.003 (statistically important) from a statistical point of view? I initially thought that if I did 2667 trials with 8 times of success, then the successful rate is 0.003 = 8 / 2667. However, I can also do 26667 with 80 times of success, the successful rate is also 0.003 = 80 / 26667. In the same case, the successful rate can also from this: 0.003 = 800 / 266667. In such a case, which number of trials (2667, 26667 or 266667) should be selected? From economic perspective, I will select 2667 because it save money and time, but from a statistical point of view, is 2667 enough to get a statistically important result (proportion of 0.003)? I hope this time it is clear. Please correct me if I am still wrong. Thanks.

Please ignore the following inputs which was considered as unclear, I edited my question above.

Currently, I faced such a question for doing the statistical testing based on the proportion (testing if a new idea can improve the successful rate). In the control group, I did 26,667 trials and the successful times were 80, then the proportion is about 0.003; in the treatment group, I did the same number of 26,667 trials, the successful time were also 80 which means the treatment group also gave the same proportion 0.003, indicating the new idea is not helpful for successful rate. The question is: if in the control and treatment groups, the proportions of success are both 0.003, then what is the minimal sample size for statistical testing of the two equal proportions? I understand that if I want to determine the sample size for testing two different proportions given the significant level and power in R. for example:

power.prop.test(p1 = 0.003, p2 = 0.004, sig.level = 0.05, power = 0.8)

then the sample size is about 54748 for each group of control and treatment. However, my question is if the two proportions in control and treatment groups are equal. In such a case, what is the minimal sample size for testing the equal proportions? For example, instead of doing 26,667 trial with 80 success results, I can do 2667 with 8 success results, then for both cases the successful rates are 0.003. So I just need 2,667 trials rather than 26,667 to get the result. How do I determine the sample size 2,667 versus 26,667 for testing the equal proportions statistically? or my question does not make sense at all? Basically, I just asked the opposite question compared with ordinary question (i.e., given two equal proportions rather than different proportions, how to determine the minimal sample size by fixing power and significant level).

Thank you for giving useful suggestions in the reply.

To clear the question, I can ask like this. If I want to get the successful rate is 0.003, how many trials (each trial has one of success or failure results) are needed, making such a proportion (0.003) is statistically important? In theory, I can do 100 trials, 1000 trials, even more trials like 10,000 trials (however, from the practical perspective, I should use the minimal trials because it saves money and time, but result is still statistically important). So the question is what is the minimal trials are needed to get the statistically important successful rate of 0.003. Here I used the power of 0.8, and significant level of 0.05. Please help. Correct me if my question makes no sense. Thanks.

Based on the input from two reply. I got the idea, that is to determine the sample size, two conditions must be given: (1) the effect/difference (e.g. 0.001 = p2 – p1) and the power (e.g. beta = 0.8), then at the significant level of alpha = 0.05, sample size can be estimated using the function like the one I posted above. If no this two conditions, sample size can not be estimated in my case.

Thank you.

Best Answer

if in the control and treatment groups, the proportions of success are both 0.003, then what is the minimal sample size for statistical testing of the two equal proportions

When you are doing hypothesis testing then the null hypothesis, when it is true, will be rejected by the significance level $\alpha$ that you choose, or when the null hypothesis is not true, it will be rejected by a rate that is ideally much higher than the significance level.

What is important is not only the case "the proportions of success are both 0.003", but instead also the cases when those proportions are different. The more different the proportions are, the more probable it becomes that you will observe a significant difference and reject the null hypothesis.

In order to determine what size of sample is neccesary to take, you could express something like the probability to observe a significant difference, given a true difference (of some specific effect size), as function of the sample sizes. So to compute the sample size you need 1) an idea of a relevant minimal difference/effect 2) a level of desired power/probability.

It is important to specify this minimal difference, since in practice the null hypothesis is almost never true. Some way or another the different treatment might have a tiny miniscule effect (not of the kind of size that was theoretically expected) and given a large enough sample you might show that the two groups are different by a tiny minuscule amount.

When doing hypothesis testing, we often challenge the null hypothesis (there is no effect) in order to show whether there is an effect or not. But what researchers might actually be interested in is to challenge the alternative hypothesis (there is an effect) in order to show whether the hypothesized effect is true or not.

Note: There is a difference between 'not rejecting the null hypothesis' and 'rejecting the alternative hypothesis'.

Two ways to deal with this type of problem are two one-sided t-tests (TOST) and likelihood ratio test. In both cases you explicitly specify both the hypotheses (null/alternative).


To the point: To do the computations of sample size you can approximate the variables as normal distributed. In a simple way you use the 0.003 as an initial value by which you can compute the variance, but a more difficult case is when the proportions turn out to be smaller than initially expected (which reduces the number of successes and you actually wish to have a certain number of successes rather than a certain number of total sample).