Solved – Sample size for a very skewed A/B Test

ab-testp-valuesample-sizeskewness

I would like to perform an A/B Test on my website. I have basic knowledge on know how to do a basic test statistics, but I'm not sure on how to choose the sample size. In particular, if I have an event with a conversion rate (before the campaign) very very low, let's say p = 1/10^6. To apply the C.L.T. on p for a given interval of confidence, I guess I need a very much larger sample size (for both the control sample and the treatment sample). How can I find the best size of the sample, given that a so skewed distribution of p?

Best Answer

If one wants to perform an A/B-Test with a small baserate not just for funsies, one has to ask what effect size i.e. which absolute improvement is considered to be worth the effort.

For example, if p=1/10^6 and number-of-visitors-per-month=10^6, then even an relative improvement of 500 % means an absolute improvement of 4 more conversions on average. If such differences cannot be justified with monetary arguments (e.g. the website is selling trips to space), an A/B-Test is not worth the trouble.

However, if such differences are considered to be worth the effort, I suggest to break down / decompose the conversion-rate into the participating factors. For example, let's say that one measures conversions as $\frac{boughtSpaceTrips}{siteVisitors}$. This rate can be splitted into ...

$\frac{boughtSpaceTrips}{siteVisitors} = \frac{boughtSpaceTrips}{spaceTripsInBasket} * \frac{spaceTripsInBasket}{siteVisitors}$

This decomposition may allow one to detect differences in a decomposed ratio, which do not appear in the composed ratio because they are countered by the other ratios (negative correlation) or have such a small contribution weight, that it requires the mentioned ton of data to do so. Whether there is some sort of negative correlation between the decomposed factors can be decided by applying domain knowledge, for example, how much does it "cost" for the user to perform a certain action.

In the given constructed example, the reasoning

Improve $\frac{boughtSpaceTrips}{spaceTripsInBasket}$ => Improve $\frac{boughtSpaceTrips}{siteVisitors}$

is valid, but the other way around

Improve $\frac{spaceTripsInBasket}{siteVisitors}$ => Improve $\frac{boughtSpaceTrips}{siteVisitors}$

is not.


If the decomposition does not lead to more feasible base rates, then take a look at the statistical discipline for this kind of problem (keyword: "rare event(s)"). But in this case you go beyond the scope of normal A/B-Tests, so I would ask again, whether this is worth the effort. Aside, my intuition tells me that one cannot avoid the pillars of the universe, so rare events still require a lot of data (but maybe not a ton), no matter which fancy method is applied (domain knowledge may help a lot though).

Related Question