Solved – What’s the “best” way to calculate sample size for A/B tests

I've read several seemingly conflicting accounts on the best way to calculate sample size. Visual Website Optimizer (VWO) has a lengthy article on this topic. So does Evan Miller. And so does Optimizely.

Using the various tools to estimate sample size with the following settings:

Baseline Conversion Rate: 3%
Minimum Detectable Effect: 20%
Significance: 95%
Variations: 2

I get the following from the various calculators:

VWO (have to set "daily visitors" to 1 to get exact sample size): 25,867
Evan Miller (set to relative, stat. power 80%): 13,050
Optimizely: 13,000

Given the seemingly different methods of calculation, which one is the "best" to use? I'm trying to understand how to approach this issue of sample size. Thanks!

(I had to list links here because I need more points to post more than 2 inline links)
References:

Articles:

vwo.com/blog/how-to-calculate-ab-test-sample-size/
www.evanmiller.org/how-not-to-run-an-ab-test.html
help.optimizely.com/hc/en-us/articles/200133789-How-long-to-run-a-test

Calculators:

vwo.com/ab-split-test-duration/
www.evanmiller.org/ab-testing/sample-size.html
www.optimizely.com/resources/sample-size-calculator/?conversion=3&effect=20&significance=95

Best Answer

There is no best to use because each method relates to specific assumptions about the testing methodology. Evan Miller's calculator calculates sample size for a two-tailed test. In the past Optimizely's calculator was calculating samples for a one-tailed test. Currently, Optimizely uses a Bayesian states engine and their sample size calculator has no input for Power, based on the construction of their stats engine. You can back into the sample size for each variation in the VWO calculator by multiplying the daily traffic * the number of days the test will run / number of variations. The results seem to imply they are also calculating sample size generically, like Evan's calculator, for a two-tailed hypothesis.

Best Answer

Related Solutions

Solved – bias of peeking at AB test data and adjusting minimum detectable effect

Solved – Calculate sample size based on Conversion Rate, Minimum Detectable Effect, Statistical power and Significance level

Related Question