Hypothesis Testing – Why Not Always Use a Binomial Exact Test to Compare Two Proportions Instead of Chi Square?

binomial distributionchi-squared-testhypothesis testingproportion;r

I am trying to figure out what test I should use in the following scenario: I know that there is a lot of room for improvement in a specific area at work – being extremely critical, let's say that sampling $52$ observations, $31$ could be improved. After instituting an improvement / QA program for six months, let me assume that out of a sample of $55$ cases, there are only $11$ with residual flaws. The two samples are independent. We are therefore comparing two proportions: $p_{\text{ initial}} =\frac{31}{52}$ and $p_{\text{ final}} = \frac{11}{55}$.

Although the numbers are exaggerated, I still want to see if the two proportions are statistically significantly different, and I think I have a couple of options: I can run an exact binomial test to calculate the probability that the new proportion of flawed observations, $\frac{11}{55}$, would occur if the actual underlying probability remained $\frac{31}{52}$. Alternatively, I can run a chi-squared test.

The chi-squared is an approximation, and what I have read is that it is to be applied when the total number of observations is too high. This is clearly not the case in the example; however, playing with the numbers in R, I couldn't see any delay or problems with the results even after using numbers $>10,000$. And there was no indication of any normal approximation being used.

So, if this is all true, why shouldn't we always opt for an exact binomial test, rather than a chi square?

The code in R for the two test would be:

    # Exact Binomial Test:
binom.test(c(11, 55 - 11), p = 31/52, alternative ="less")

    #Chi-square Test:
prop.test(c(31, 11), c(52, 55), correct = FALSE, alternative = 'greater')

Best Answer

You state that you have read the chi-squared test should be used when "the total number of observations is too high". I have never heard this. I don't believe it is true, although it is hard to say, since "too high" is quite vague. There is a standard recommendation not to use the chi-squared test when there are any cells with expected counts less than 5. This traditional warning is now known to be too conservative. Having an expected count less than 5 in a cell is not really a problem. Nonetheless, maybe what you heard is somehow related to that warning.

As @whuber notes, the two different tests you ask about make different assumptions about your data. The exact test assumes that the probability (31/52) is known a-priori and without error. The chi-squared test estimates the proportions for both before and after. Notably, both of those proportions are treated as having uncertainty due to sampling error.

Thus, the chi-squared test will have less power, but is probably more honest. It may well be that the true proportion of flawed observations was considerably lower than 31/52, but it looked that bad by chance alone. You certainly may test if the after proportion is less than 31/52, just as you may test the after proportion against any value. But a significant result would not necessarily imply that the process improved following the QA program; you should only conclude that the proportion is less than an arbitrary number.

Related Question