Hypothesis Testing – Hypothesis Testing for Two Proportions in A/B Testing

ab-testhypothesis testingproportion;z-test

I was trying to understand the statistical tests to use during A/B testing and stumbled upon this documentation of the Amazon A/B testing SDK:

https://developer.amazon.com/public/apis/manage/ab-testing/doc/math-behind-ab-testing

From what I understood, when we are doing a Z-test for two proportions – a pooled proportion is used to calculate the standard error since we assume the Null hypothesis to be true (where there is no difference between the variations and hence the standard error should be based on the pooled proportion). How come the formula for Standard Error here does not use a pooled proportion? What would be the consequence of doing that?

Best Answer

I believe your instincts are correct. The question of pooled vs. unpooled was discussed in another question previously. Generally speaking, statistical textbooks will usually say the only time you would use unpooled proportion is when you believe there already a difference in the population proportions.

The link you provide I assume is meant to be an overview but leaves much to be desired. From what I have read, you can use the unpooled estimate of standard error when estimating confidence intervals but use the pooled for the test statistics as discussed here. That said, the link you provide doesn't actually present a confidence interval and only lists the confidence (which one could only assume is 1-p.value). It is difficult to know for sure what was used given the precision reported. Replicating the analysis in R you will see that both calculations lead to very similar results (where rounding provides the same answer) where the very similar (and large) sample sizes also alleviate some concerns about using the unpooled method. It is worth noting that they also must have done a 'one-tailed test' assuming variation B wouldn't increase conversions, which may not be the appropriate assumption.

n1=1064
p1=320/n1
n2=1043
p2=250/n2

p = (p1 * n1 + p2 * n2) / (n1 + n2)
se = sqrt((p*(1-p))*((1/n1)+(1/n2)))
z = (p1-p2)/se
pnorm(abs(z))
[1] 0.9991959

pa = 320/1064
pb = 250/1043
sea = (pa*(1-pa))/n1
seb = (pb*(1-pb))/n2
z2 = ((pa-pb)/sqrt(sea+seb))
pnorm(abs(z2))
[1] 0.9992223

As a final point, however, some apparently do state that the unpooled test is more powerful and prefer to use it instead as noted in this reference manual for WinCross on page 42. So the final statistical decision does not quite exist on this, at least to my knowledge (would be happy to be proven wrong though).

Ultimately what it comes down to is clearly reporting what you did so it can be reviewed and interpreted appropriately.

Related Question