Solved – How many trials are needed to get a statistically important proportion of 0.003 for a binomial variable

binomial distributionhypothesis testingproportion;rstatistical significance

I want to do an experiment and try to get an initial proportion (e.g., successful rate). The experiment will generate one of two results (either success or failure). I would like to know how many trials are needed to get a successful rate of 0.003 (statistically important) from a statistical point of view? I initially thought that if I did 2667 trials with 8 times of success, then the successful rate is 0.003 = 8 / 2667. However, I can also do 26667 with 80 times of success, the successful rate is also 0.003 = 80 / 26667. In the same case, the successful rate can also from this: 0.003 = 800 / 266667. In such a case, which number of trials (2667, 26667 or 266667) should be selected? From economic perspective, I will select 2667 because it save money and time, but from a statistical point of view, is 2667 enough to get a statistically important result (proportion of 0.003)? I hope this time it is clear. Please correct me if I am still wrong. Thanks.

Please ignore the following inputs which was considered as unclear, I edited my question above.

Currently, I faced such a question for doing the statistical testing based on the proportion (testing if a new idea can improve the successful rate). In the control group, I did 26,667 trials and the successful times were 80, then the proportion is about 0.003; in the treatment group, I did the same number of 26,667 trials, the successful time were also 80 which means the treatment group also gave the same proportion 0.003, indicating the new idea is not helpful for successful rate. The question is: if in the control and treatment groups, the proportions of success are both 0.003, then what is the minimal sample size for statistical testing of the two equal proportions? I understand that if I want to determine the sample size for testing two different proportions given the significant level and power in R. for example:

power.prop.test(p1 = 0.003, p2 = 0.004, sig.level = 0.05, power = 0.8)

then the sample size is about 54748 for each group of control and treatment. However, my question is if the two proportions in control and treatment groups are equal. In such a case, what is the minimal sample size for testing the equal proportions? For example, instead of doing 26,667 trial with 80 success results, I can do 2667 with 8 success results, then for both cases the successful rates are 0.003. So I just need 2,667 trials rather than 26,667 to get the result. How do I determine the sample size 2,667 versus 26,667 for testing the equal proportions statistically? or my question does not make sense at all? Basically, I just asked the opposite question compared with ordinary question (i.e., given two equal proportions rather than different proportions, how to determine the minimal sample size by fixing power and significant level).

Thank you for giving useful suggestions in the reply.

To clear the question, I can ask like this. If I want to get the successful rate is 0.003, how many trials (each trial has one of success or failure results) are needed, making such a proportion (0.003) is statistically important? In theory, I can do 100 trials, 1000 trials, even more trials like 10,000 trials (however, from the practical perspective, I should use the minimal trials because it saves money and time, but result is still statistically important). So the question is what is the minimal trials are needed to get the statistically important successful rate of 0.003. Here I used the power of 0.8, and significant level of 0.05. Please help. Correct me if my question makes no sense. Thanks.

Based on the input from two reply. I got the idea, that is to determine the sample size, two conditions must be given: (1) the effect/difference (e.g. 0.001 = p2 – p1) and the power (e.g. beta = 0.8), then at the significant level of alpha = 0.05, sample size can be estimated using the function like the one I posted above. If no this two conditions, sample size can not be estimated in my case.

Thank you.

Best Answer

if in the control and treatment groups, the proportions of success are both 0.003, then what is the minimal sample size for statistical testing of the two equal proportions

When you are doing hypothesis testing then the null hypothesis, when it is true, will be rejected by the significance level $\alpha$ that you choose, or when the null hypothesis is not true, it will be rejected by a rate that is ideally much higher than the significance level.

What is important is not only the case "the proportions of success are both 0.003", but instead also the cases when those proportions are different. The more different the proportions are, the more probable it becomes that you will observe a significant difference and reject the null hypothesis.

In order to determine what size of sample is neccesary to take, you could express something like the probability to observe a significant difference, given a true difference (of some specific effect size), as function of the sample sizes. So to compute the sample size you need 1) an idea of a relevant minimal difference/effect 2) a level of desired power/probability.

It is important to specify this minimal difference, since in practice the null hypothesis is almost never true. Some way or another the different treatment might have a tiny miniscule effect (not of the kind of size that was theoretically expected) and given a large enough sample you might show that the two groups are different by a tiny minuscule amount.

When doing hypothesis testing, we often challenge the null hypothesis (there is no effect) in order to show whether there is an effect or not. But what researchers might actually be interested in is to challenge the alternative hypothesis (there is an effect) in order to show whether the hypothesized effect is true or not.

Note: There is a difference between 'not rejecting the null hypothesis' and 'rejecting the alternative hypothesis'.

Two ways to deal with this type of problem are two one-sided t-tests (TOST) and likelihood ratio test. In both cases you explicitly specify both the hypotheses (null/alternative).

To the point: To do the computations of sample size you can approximate the variables as normal distributed. In a simple way you use the 0.003 as an initial value by which you can compute the variance, but a more difficult case is when the proportions turn out to be smaller than initially expected (which reduces the number of successes and you actually wish to have a certain number of successes rather than a certain number of total sample).

Related Solutions

Solved – Testing for differences in very small proportions

I find that it helps to think in events rather than proportions to get the general scale needed, then go to more precise power calculation. For rare events, the sampling error is related to the square root of the number of events. So if your group b has a proportion of 0.007, that's 700 expected events in a sample of 100,000 cases, with a sampling error of around 25 events. So it seems that you shouldn't be so far away from adequate power as your output from the pwr package suggests; a proportion of 0.008 in a sample of 100,000 cases has 800 expected events.

Double-check that the input to the program you used in the pwr package is correct. I don't use it, but it seems that there is a specific definition of "effect size" in the ES.h() program in that package. Using that formula for proportions of 0.007 and 0.008 gives me an "effect size" of 0.011, not the simple proportion difference of 0.001 you seemed to have specified in calling the program.

You can't get away from the need for large numbers of cases with low proportions, but things might not be quite so bad for your present application as you fear.

Solved – R – power.prop.test, prop.test, and unequal sample sizes in A/B tests

Is this method sound or at least on the right track?

Yes, I think it's a pretty good approach.

Could I specify alt="greater" on prop.test and trust the p-value even though power.prop.test was for a two-sided test?

I'm not certain, but I think you'll need to use alternative="two.sided" for prop.test.

What if the p-value was greater than .05 on prop.test? Should I assume that I have a statistically significant sample but there is no statistically significant difference between the two proportions? Furthermore, is statistical significance inherent in the p-value in prop.test - i.e. is power.prop.test even necessary?

Yes, if p-value is greater than .05 then there is no confidence that there is a detectable difference between the samples. Yes, statistical significance is inherent in the p-value, but the power.prop.test is still necessary before you start your experiment to determine your sample size. power.prop.test is used to set up your experiment, prop.test is used to evaluate the results of your experiment.

BTW - You can calculate the confidence interval for each group and see if they overlap at your confidence level. You can do that by following these steps for Calculating Many Confidence Intervals From a t Distribution.

To visualize what I mean, look at this calculator with your example data plugged in: http://www.evanmiller.org/ab-testing/chi-squared.html#!2300/20000;2100/20000@95

Here is the result:

confidence interval for each group

Notice the graphic it provides that shows the range of the confidence interval for each group.

What if I can't do a 50/50 split and need to do, say, a 95/5 split? Is there a method to calculate sample size for this case?

This is why you need to use power.prop.test because the split doesn't matter. What matters is that you meet the minimum sample size for each group. If you do a 95/5 split, then it'll just take longer to hit the minimum sample size for the variation that is getting the 5%.

What if I have no idea what my baseline prediction should be for proportions? If I guess and the actual proportions are way off, will that invalidate my analysis?

You'll need to draw a line in the sand, guess a reasonable detectable effect, and calculate the necessary sample size. If you don't have enough time, resources, etc. to meet the calculated sample size in power.prop.test, then you'll have to lower your detectable effect. I usually set it up like this and run through different delta values to see what the sample size would need to be for that effect.

#Significance Level (alpha)
alpha <- .05

# Statistical Power (1-Beta)
beta <- 0.8

# Baseline conversion rate
p <- 0.2   

# Minimum Detectable Effect
delta <- .05

power.prop.test(p1=p, p2=p+delta, sig.level=alpha, power=beta, alternative="two.sided")

Best Answer

Related Solutions

Solved – Testing for differences in very small proportions

Solved – R – power.prop.test, prop.test, and unequal sample sizes in A/B tests

Related Question