Hypothesis Testing – Two Sample Independent Test for Proportions: Z Test vs T Test

ab-testhypothesis testingt-testz-test

Context:

I have two independent samples that I want to compare for equality through an AB test.
The metric being evaluated is binary: user clicks or not (proportions).

Question:

  1. I keep reading that z test are not to be used based on the assumptions that it makes and that t tests are recommended instead. I haven't seen any formula for two independent samples t test for proportions, but only for means. Does it exist?

  2. Is it even true that two sample z test for proportion is not recommended?

Best Answer

There are several nearly equivalent tests to compare two binomial proportions. The parameter of interest is the difference between the two population proportions $p_1 - p_2,$ which is usually estimated by the difference between the corresponding two sample proportions $\hat p_1 - \hat p_2,$ where $\hat p_i = x_i/n_i, i=1,2.$ with numbers $x_i$ of successes in $n_i$ trials.

Differences among the tests center on how or whether to use a normal approximation and on how to estimate the standard deviation of $\hat p_1 - \hat p_2,$ often called the (estimated) standard error. Roughly speaking,

  • One method is to assume equality of $p_1$ and $p_2,$ estimating $p = p_1 = p_2$ as $\hat p = \frac{x_1+x_2}{n_1+n_2}$ and $\widehat{\mathrm{Var}}(\hat p) = (n_1+n_2)\hat p(1-\hat p).$

  • An alternative method is to estimate the variances of the $\hat p_i$ separately and add.

Moreover, especially for small $n_i,$ various tests use different continuity corrections when invoking normal approximations, and other tests use no continuity correction.

Fortunately, these variations in method often make very little difference in final results. So it is more important to remember that the variations exist (so as not to be puzzles when various analyses do not match exactly) than to worry about which to use.

In R, the procedure prop.test uses a test statistic with an approximate chi-squared distribution. Suppose that there are $n_1 = 100$ subjects in the A group with $x_1 = 83$ successes and, independently $n_2 = 88$ subjects in the B group with $x_2 = 92$ successes, so that $\hat p_1 = 0.83, \hat p_2 \approx 0.9565.$ These two sample proportions differ significantly at the $1\%$ level because the P-value of the test is smaller than $0.01.$

prop.test(c(83,88), c(100,92), cor=F)

        2-sample test for equality of proportions 
        without continuity correction

data:  c(83, 88) out of c(100, 92)
X-squared = 7.8742, df = 1, p-value = 0.005015
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.21111962 -0.04192386
sample estimates:
   prop 1    prop 2 
0.8300000 0.9565217 

Notice that this test is the same as a chi-squared test of homogeneity on the $2 \times 2$ table there columns are for A and B, rows are for Success and Failure. (Particularly with sample sizes around 100 or larger, I choose to use the argument cor=F to suppress the continuity correction.)

TAB = rbind(c(83,88), c(17,4));  TAB

     chisq.test(TAB, cor=F)
         [,1] [,2]
    [1,]   83   88
    [2,]   17    4

        Pearson's Chi-squared test

data:  TAB
X-squared = 7.8742, df = 1, p-value = 0.005015

The P-value is exactly the same as for a two-tailed test prop.test(as above), but no confidence interval or estimates $\hat p_i$ are given.

Notes: (1) If some of the counts in TAB are very small (thus, triggering a warning message), it is best to use chisq.test with parameter sim=T to get a simulated P-value that may be more useful than the one from the traditional chi-squared test statistic.

(2) Several other Answers on this site discuss tests of binomial proportions. You may find additional example and discussions alternative tests there. Also, uses of alternative tests can be found online.

Related Question