Assumptions
I consider an A/B test where there is a control group and a variant group. Each observation can either be true (converted) or false (not converted).
I evenly and randomly split the incoming users to the two treatments.
So, the results can be summarized in a contingency table:
| | Converted | Not converted |
|-------|-----------|---------------|
|Control| | |
|Variant| | |
Let the conversion rate be Converted / (Converted + Not Converted)
.
The null hypothesis is that the conversion rate is independent of the treatment.
It seems like in this case, I can use either the two-tailed $z$-test or the $\chi^2$-test. Feel free to correct me on this one.
Determining the sample size
I want to use statsmodels.stats.power.GofChisquarePower.solve_power
and statsmodels.stats.power.NormalIndPower.solve_power
.
For example:
import statsmodels.stats.power as power
zpower = power.NormalIndPower()
chipower = power.GofChisquarePower()
zpower.solve_power(0.1, nobs1=None, alpha=0.05, power=0.9, ratio=1.) # Returns ~2100
chipower.solve_power(0.1, nobs=None, alpha=0.05, power=0.9) # Returns ~1050
Question: I am puzzled by the huge difference. What is the reason for it? Am I using something wrongly in regards to my assumptions?
N.B. I now realize that the documentation states that GofChisquarePower.solve_power
(solves) for any one parameter of the power of a one sample chisquare-test
and NormalIndPower.solve_power
(solves) for any one parameter of the power of a two sample z-test
What is the difference between the one sample and two samples?
Best Answer
Simply put:
A one sample test is used to test a sample mean ($\mu_0$) to a known population mean ($\mu$). Think about testing the height of a sample of females against the average female height according to the latest census.
A two sample test is used to test a sample mean from one group ($\mu_1$) against a sample mean from another, independent group ($\mu_2$).
This is why the required sampled size for a two-sample test is double the required sample size for a one-sample test. A/B tests should use a two sample test.
For more information, there are a lot of resources on this site and online related to one sample vs two sample tests.