Hypothesis Testing – How to Power a Strong Superiority Hypothesis in an A/B Test

ab-testeffect-sizehypothesis testingstatistical-power

Suppose I would like to perform an A/B Test, where it only makes sense to implement the "treatment" over the "control" if the "treatment" is atleast 2% better than the "control". Lets suppose that the current "control" conversion rate is 3% and an absolute increase of 2% is required, to attain a conversion rate of 5%.

In the case of a strong superiority hypothesis we can make the null hypothesis:

H0= Difference between treatment and control<=2%

Ha= Difference >2%

How should we power this experiment? Which effect size should we choose, if we are okay with any effect size above 2%?

Aside:
If instead of a strong superiority null hypothesis, we just used a standard null hypothesis, where the difference is assumed to be 0%, we could power the experiment such that we achieve 80% power for an effect size of 2%. However, this does not "guarantee" a true effect size of 2%, even if the observed effect size is 2%.

Best Answer

Your hypothesis is as follows:

$$ H_0: p_t - p_c \le 0.02 \\ H_1: p_t - p_c > 0.02 $$

First, let's say the actual proportion for the control group, $p_c$, is 0.03 and the proportion for the treatment group, $p_t$, is 0.05. In this case, even an infinite sample size will not give you 80 percent power to reject the null because 0.05 - 0.03 is in fact equal to 0.02, i.e. the null is true. Substituting 0.051 for 0.05 won't make much difference. So basically, unless the difference is substantially larger than 0.02, you're going to need a huge sample size.

Assuming equal sample sizes such that $n_c = n_t$, the required sample size for a one-sided superiority test is as follows:

$$ n_t = \big(p_c (1 - p_c) + p_t (1 - p_t)\big) \bigg(\frac{z_{1 - \alpha} + z_{1 - \beta}} {p_c - p_t + \delta}\bigg)^2 $$

The only difference between this and a standard formula for the difference between two proportions is the addition of $\delta$.

We can calculate this in R as follows. Here we estimate that the treatment proportion is equal to 0.07.

p_c <- 0.03
p_t <- 0.07
delta <- 0.02
alpha <- 0.05
beta <- 0.2


(p_c * (1 - p_c) + p_t * (1  - pt)) * ((qnorm(1 - alpha) + qnorm(1 - beta))/(p_c - p_t + delta))^2

This suggests that we would need a sample size of ~1445 for each group (treatment and control). However, if you run this assuming a treatment proportion of 0.51, you'll see that you would need a sample of ~1 million to achieve 80 percent power.

Thus, as Christian mentioned above, there are no standard answers to the required sample size for an hypothesis test because it really depends on the effect size. If you knew the effect size, though, you wouldn't need a hypothesis test. So you'll just have to use your best judgment, perhaps based on effect sizes detected in previous A/B tests that you think might be similar to this one.

For an online calculator for superiority tests, see http://powerandsamplesize.com/Calculators/Compare-2-Proportions/2-Sample-Non-Inferiority-or-Superiority This also includes a few more formulas and additional R code.

Related Question