R Prop.test – Addressing Chi-squared Approximation Errors

binomial distributionchi-squared-testproportion;r

I am trying to compare the proportions of two populations, using prop.test

My data is straightforward – first population is 6/26 and second is 15/171. I am trying to see if I have significance that the proportion in the first population is greater than the second.

When I run the prop.test in R, my code is:

prop.test(c(6,15), c(26, 171), alternative="greater").

However, I get a warning:

In prop.test(c(6, 15), c(26, 171), alternative = "greater") :
  Chi-squared approximation may be incorrect

My assumption is that this is based on the small sample size in the first population. Is that correct? I have read this post, which seems to indicate that indeed the issue is small sample size, but the solution provided isn't applicable with the prop.test.

Is there any way to correct for this?

If there is not, is there any way to get a sense as to how much lack of correctness in the chi-squared approximation may be impacting on my p-value? In this case, the p-value reported is 0.03137. Can I assume that even with the potential issue with the chi-squared approximation I would still have 95% confidence, or not necessarily?

Best Answer

The warning is because one of the expected values in the chi-squared is less than 5.

a <- c(6, 15)
b <- c(26, 171)
m <- matrix(c(a, b-a), ncol=2)
chisq.test(m)
chisq.test(m)$expected

However, that rule of thumb is a bit conservative and there are other rules of thumb that you can consider. Some of those other rules of thumb are passed and some are not.

Instead of a chi-squared test, there is also a binomial proportion test.

p1 <- 6/26
n1 <- 26
p2 <- 15/171
n2 <- 171
p <- (n1 * p1 + n2 * p2)/ (n1 + n2)
z <- (p1 - p2) / sqrt(p * (1-p) * (1/n1 + 1/n2))
z

Here we use a normal approximation to the binomial distribution. For this approximiation, there is a rule of thumb that both $np > 5$ and $n(1-p) > 5$ which is true for both proportions. Also, for these two proportions, the normal approximation looks reasonable to me when plotted.

hist(rbinom(10000, 26, 6/26))
hist(rbinom(10000, 171, 15/171))

For this data, the binomial proportion test give a one-sided p-value=0.0139. The one sided prop.test gives a p-value=0.03137.

As @EdM mentions in the comments below, some people feel Fisher's exact test maybe suitable in this situation. This other page nicely gives references to a few people about the appropriateness of Fisher's exact test and it looks like the matter is not yet decided. This test gives a one-sided p-value=0.03963

fisher.test(m, alternative = 'greater') 
Related Question