Confidence Interval – Difference Between G-test and t-test for Effective A/B Testing

ab-testconfidence interval

The G-Test is a way to get quick estimates of a chi squared distribution, and is recommended by the author of this well-known A/B test tutorial.

This tool assumes a normal distribution and uses difference of means to compute confidence.

What is the difference between a G test and a T test? What are the benefits or downsides to using each method to measure the effectiveness of our A/B tests?

I'm trying to figure out which one I should use to measure the results of my A/B test framework. Our framework has two general use cases: split the group of visitors evenly, show each one a different feature and measure their conversion on some other page (say, the sign up page); and split the group of visitors into the control group (90%) and an experimental group (10%) for a test, and measure conversions on some other page.

Our website gets between 1000 and 200,000 visits per day. These visits are split with an exponential distribution across about 300 pages.

Thanks,
Kevin

Best Answer

In general, the test which is less approximate in calculating the test statistics is better, although all will converge to the same results with increasing sample size.

So, since A/B-tests generally focus on binary outcomes, ...

Short answer:

Use the G-test, because it is less approximate.

Long answer:

The t-test, in A/B-tests the case of unequal sample sizes and unequal variance, approximates the difference of two distributions with a t-distribution, which is questionable itself. The two distributions may be unknown, but it is considered that their mean and variance is sufficient to describe it (otherwise any conclusion won't help much), which is of course true for the normal distribution.

In the special case of binary outcome, the binomial distribution can be approximated with a normal distribution with $\mu=np,\sigma^2=np(1-p)$, which is valid for $n*p*(1-p)\geq9$ (rule of the thumb, $n$=trials,$p$=success-rate).

So, in summary, although it is ok to apply the t-test, two approximations are performed to transform the binomial case to a more generic case, which is not necessary here, since less approximative tests like the G-test or (even better) Fisher's exact test are available for this special case. The Fisher's exact test should be applied especially if the sample-size is less equal 20 (another rule of the thumb), but I guess this does not matter in a solid A/B-test.

Related Question