Solved – Determining samples sizes for an A/B test using z-test and chi-square test

chi-squared-testhypothesis testingpythonstatistical-powerz-test

Assumptions

I consider an A/B test where there is a control group and a variant group. Each observation can either be true (converted) or false (not converted).
I evenly and randomly split the incoming users to the two treatments.
So, the results can be summarized in a contingency table:

|       | Converted | Not converted |
|-------|-----------|---------------|
|Control|           |               |
|Variant|           |               |

Let the conversion rate be Converted / (Converted + Not Converted).
The null hypothesis is that the conversion rate is independent of the treatment.

It seems like in this case, I can use either the two-tailed $z$-test or the $\chi^2$-test. Feel free to correct me on this one.

Determining the sample size

I want to use statsmodels.stats.power.GofChisquarePower.solve_power and statsmodels.stats.power.NormalIndPower.solve_power.
For example:

import statsmodels.stats.power as power
zpower = power.NormalIndPower()
chipower = power.GofChisquarePower()
zpower.solve_power(0.1, nobs1=None, alpha=0.05, power=0.9, ratio=1.) # Returns ~2100
chipower.solve_power(0.1, nobs=None, alpha=0.05, power=0.9) # Returns ~1050

Question: I am puzzled by the huge difference. What is the reason for it? Am I using something wrongly in regards to my assumptions?

N.B. I now realize that the documentation states that GofChisquarePower.solve_power

(solves) for any one parameter of the power of a one sample chisquare-test

and NormalIndPower.solve_power

(solves) for any one parameter of the power of a two sample z-test

What is the difference between the one sample and two samples?

Best Answer

Simply put:

  1. A one sample test is used to test a sample mean ($\mu_0$) to a known population mean ($\mu$). Think about testing the height of a sample of females against the average female height according to the latest census.

  2. A two sample test is used to test a sample mean from one group ($\mu_1$) against a sample mean from another, independent group ($\mu_2$).

This is why the required sampled size for a two-sample test is double the required sample size for a one-sample test. A/B tests should use a two sample test.

For more information, there are a lot of resources on this site and online related to one sample vs two sample tests.