Solved – Significance test for two groups with dichotomous variable

statistical significancet-test

I have a 2×2 table with two independent groups of people that replied Yes or No in the survey:

Yes No
Group A 350 1250
Group B 1700 3800

Could you help to find a test that can be run on these figures to see if there is a statistical significance between the two groups if it exists?

Best Answer

BruceET provides one way of analyzing this table. There are several tests for 2 by 2 tables which are all asymptotically equivalent, meaning that with enough data all tests are going to give you the same anwer. I present them here with R code for posterity.

In my answer, I'm going to transpose the table since I find it easier to have groups as columns and outcomes as rows.

The table is then

Group A Group B
Yes 350 1700
No 1250 3800

I'll reference the elements of this table as

Group A Group B
Yes $a$ $b$
No $c$ $d$

$N$ will be the sum of all the elements $N = a+b+c+d$.

The Chi Square Test

Perhaps the most common test for 2 by 2 tables is the chi square test. Roughly, the null hypothesis of the chi square test is that the proportion of people who answer yes is the same in each group, and in particular it is the same as the proportion of people who answer yes were I to ignore groups completely.

The test statistic is

$$ X^2_P = \dfrac{(ad-bc)^2N}{n_1n_2m_1m_2} \sim \chi^2_1$$

Here $n_i$ are the column totals and $m_i$ are the row totals. This test statistic is asymptotically distributed as Chi square (hence the name) with one degree of freedom.

The math is not important, to be frank. Most software packages, like R, implement this test readily.

m = matrix(c(350,1250, 1700, 3800), nrow=2)
chisq.test(m, correct = F)
    Pearson's Chi-squared test

data:  m
X-squared = 49.257, df = 1, p-value = 2.246e-12


The correct=F is so that R implements the test as I have written it and does not apply a continuity correction which is useful for small samples. The p value is very small here so we can conclude that the proportion of people who answer yes in each group is different.

Test of Proportions

The test of proportions is similar to the chi square test. Let $\pi_i$ be the probability of answering Yes in group $i$. The test of proportions tests the null that $\pi_1 = \pi_2$.

In short, the test statistic for this test is

$$ z = \dfrac{p_1-p_2}{\sqrt{\dfrac{p_1(1-p_1)}{n_1} + \dfrac{p_2(1-p_2)}{n_2}}} \sim \mathcal{N}(0,1) $$

Again, $n_i$ are the column totals and $p_1 = a/n_1$ and $p_2=b/n_2$. This test statistic has standard normal asymptotic distribution. If your alternative is that $p_1 \neq p_2$ then you want this test statistic to be larger than 1.96 in absolute value in most cases to reject the null.

In R

# Note that the n argument is the column sums

prop.test(x=c(350, 1700), n=c(1600, 5500), correct = F)
data:  c(350, 1700) out of c(1600, 5500)
X-squared = 49.257, df = 1, p-value = 2.246e-12
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.11399399 -0.06668783
sample estimates:
   prop 1    prop 2 
0.2187500 0.3090909 

Note that the X-squared statistic in the output of this test is identical to the chi-square test. There is a good reason for that which I will not talk about here. Note also that this test provides a confidence interval for the difference in proportions, which is an added benefit over the chi square test.

Fisher's Exact Test

Fisher's exact test conditions on the quantites $n_1 = a+c$ and $m_1 = a + b$. The null of this test is that the probability of success in each group is the same, $\pi_1 = \pi_2$, like the test of proportions. The actual null hypothesis in the derivation of the test is about the odds ratio, but that is not important now.

The exact probability of observing the table provided is

$$ p = \dfrac{n_1! n_2! m_1! m_2!}{N! a! b! c! d!} $$

John Lachin writes

Thus, the probability of the observed table can be considered to arise from a collection of $N$ subjects of whom $m_1$ have positive response, with $a$ of these being drawn from the $n_1$ subjects in group 1 and $b$ from among the $n_2$ subjects in group 2 ($a+b=m_1$, $n_1 + n_2 = N$).

Importantly, this is not the p value. It is the probability of observing this table. In order to compute the p value, we need to sum up probabilities of observing tables which are more extreme than this one.

Luckily, R does this for us

m = matrix(c(350,1250, 1700, 3800), nrow=2)
fisher.test(m)

    Fisher's Exact Test for Count Data

data:  m
p-value = 1.004e-12
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.5470683 0.7149770
sample estimates:
odds ratio 
 0.6259224 

Note the result is about odds ratios and not about probabilities in each group. It is also worth noting, again from Lachin,

The Fisher-Irwin exact test has been criticized as being too conservative because other unconditional tests have been shown to yield a smaller p value and thus are more powerful.

When the data are large, this point becomes moot because you've likely got enough power to detect small effects, but it all depends on what you're trying to test (as it always does).


Thus far, we have examined what are likely to be the most prevalent tests for this sort of data. The following tests are equivalent to the first two, but are perhaps less known. I present them here for completeness.

Odds Ratio

The odds ratio $\widehat{OR}$ for this table is $ad/bc$, but because the odds ratio is bound to be strictly positive, it can be more convenient to work with the log odds ratio $\log(\widehat{OR})$.

Asymptotically, the sampling distribution for the log odds ratio is normal. This means we can apply a simple $z$ test. Our test statistic is

$$ Z = \dfrac{\log(\widehat{OR}) - \log(OR)}{\sqrt{\hat{V}(\log(\widehat{OR})}} $$.

Here, $\hat{V}(\log(\widehat{OR}))$ is the estimated variance of the log odds ratio and is equal to $1/a + 1/b + 1/c + 1/d$.

In R


odds_ratio = m[1, 1]*m[2, 2]/(m[2, 1]*m[1, 2])
vr = sum(1/m)
Z = log(odds_ratio)/sqrt(vr)

p.val = 2*pnorm(abs(Z), lower.tail = F)

which returns a Z value of -6.978754 and a p value less than 0.01.

Cochran's test

The test statistic is

$$ X^2_u = \dfrac{\dfrac{n_2a-n_1b}{N}}{\dfrac{n_1n_2m_1m_2}{N^3}} \sim \chi^2_1 $$

In R


m = matrix(c(350,1250, 1700, 3800), nrow=2)
a = 350 
b = 1700
c = 1250
d = 3800
N = a+b+c+d
n1 = a+c
n2 = b+d
m1 =a+b
m2 =c+d
X = ((n2*a-n1*b)/N)^2 /((n1*n2*m1*m2)/N^3)

# Look familiar?
X
>>>49.25663

p.val = pchisq(X,1, lower.tail=F)
p.val 
>>>[1] 2.245731e-12

Conditional Mantel-Haenszel (CMH) Test

The CMH Test (I think I've seen this called the Cochran Mantel-Haenszel Test elsewhere) is a test which conditions on the first column total and first row total.

The test statistic is

$$ X^2_c = \dfrac{\left( a - \dfrac{n_1m_1}{N} \right)^2}{\dfrac{n_1n_2m_1m_2}{N^2(N-1)}} \sim \chi^2_1$$

In R


a = 350 
b = 1700
c = 1250
d = 3800
N = a+b+c+d
n1 = a+c
n2 = b+d
m1 =a+b
m2 =c+d


top =( a - n1*m1/N)^2
bottom = (n1*n2*m1*m2)/(N^2*(N-1))
X = top/bottom

X
>>>49.24969

p.val = pchisq(X, 1, lower.tail = F)
p.val
>>> [1] 2.253687e-12

Likelihood Ratio Test (LRT) (My Personal Favourite)

The LRT compares the difference in log likelihood between a model which freely estimates the group proportions and a model which only estimates a single proportion (not unlike the chi-square test). This test is a bit overkill in my opinion as other tests are simpler, but hey why not include it? I like it personally because the test statistic is oddly satisfying and easy to remember

The math, as before, is irrelevant for our purposes. The test statistic is

$$ X^2_G = 2 \log \left( \dfrac{a^a b^b c^c d^d N^N}{n_1^{n_1} n_2^{n_2} m_1^{m_1} m_2^{m_2}} \right) \sim \chi^2_1 $$

In R with some applied algebra to prevent overflow



a = 350 
b = 1700
c = 1250
d = 3800
N = a+b+c+d
n1 = a+c
n2 = b+d
m1 =a+b
m2 =c+d

top = c(a,b,c,d,N)
bottom = c(n1, n2, m1, m2) 

X = 2*log(exp(sum(top*log(top)) - sum(bottom*log(bottom))))

# Very close to other tests
X
>>>[1] 51.26845

p.val = pchisq(X, 1, lower.tail=F)
p.val
>>>1] 8.05601e-13

Note that there is a discrepancy in the test statistic for the LRT and the other tests. It has been noted that this test statistic converges to teh asymptotic chi square distribution at a slower rate than the chi square test statistic or the Cochran's test statistic.

What Test Do I Use

My suggestion: Test of proportions. It is equivalent to the chi-square test and has the added benefit of being a) directly interpretable in terms of risk difference, and b) provides a confidence interval for this difference (something you should always be reporting).

I've not included theoretical motivations for these tests, though understanding those are not essential but captivating in my own opinion.

If you're wondering where I got all this information, the book "Biostatsitical Methods - The Assessment of Relative Risks" by John Lachin takes a painstakingly long time to explain all this to you in chapter 2.

Related Question