R Prop.test – Addressing Chi-squared Approximation Errors

binomial distributionchi-squared-testproportion;r

I am trying to compare the proportions of two populations, using prop.test

My data is straightforward – first population is 6/26 and second is 15/171. I am trying to see if I have significance that the proportion in the first population is greater than the second.

When I run the prop.test in R, my code is:

prop.test(c(6,15), c(26, 171), alternative="greater").

However, I get a warning:

In prop.test(c(6, 15), c(26, 171), alternative = "greater") :
  Chi-squared approximation may be incorrect

My assumption is that this is based on the small sample size in the first population. Is that correct? I have read this post, which seems to indicate that indeed the issue is small sample size, but the solution provided isn't applicable with the prop.test.

Is there any way to correct for this?

If there is not, is there any way to get a sense as to how much lack of correctness in the chi-squared approximation may be impacting on my p-value? In this case, the p-value reported is 0.03137. Can I assume that even with the potential issue with the chi-squared approximation I would still have 95% confidence, or not necessarily?

Best Answer

The warning is because one of the expected values in the chi-squared is less than 5.

a <- c(6, 15)
b <- c(26, 171)
m <- matrix(c(a, b-a), ncol=2)
chisq.test(m)
chisq.test(m)$expected

However, that rule of thumb is a bit conservative and there are other rules of thumb that you can consider. Some of those other rules of thumb are passed and some are not.

Instead of a chi-squared test, there is also a binomial proportion test.

p1 <- 6/26
n1 <- 26
p2 <- 15/171
n2 <- 171
p <- (n1 * p1 + n2 * p2)/ (n1 + n2)
z <- (p1 - p2) / sqrt(p * (1-p) * (1/n1 + 1/n2))
z

Here we use a normal approximation to the binomial distribution. For this approximiation, there is a rule of thumb that both $np > 5$ and $n(1-p) > 5$ which is true for both proportions. Also, for these two proportions, the normal approximation looks reasonable to me when plotted.

hist(rbinom(10000, 26, 6/26))
hist(rbinom(10000, 171, 15/171))

For this data, the binomial proportion test give a one-sided p-value=0.0139. The one sided prop.test gives a p-value=0.03137.

As @EdM mentions in the comments below, some people feel Fisher's exact test maybe suitable in this situation. This other page nicely gives references to a few people about the appropriateness of Fisher's exact test and it looks like the matter is not yet decided. This test gives a one-sided p-value=0.03963

fisher.test(m, alternative = 'greater')

Related Solutions

Chi-Squared Test – Understanding Relationship with Test of Equal Proportions

Very short answer:

The chi-Squared test (chisq.test() in R) compares the observed frequencies in each category of a contingency table with the expected frequencies (computed as the product of the marginal frequencies). It is used to determine whether the deviations between the observed and the expected counts are too large to be attributed to chance. Departure from independence is easily checked by inspecting residuals (try ?mosaicplot or ?assocplot, but also look at the vcd package). Use fisher.test() for an exact test (relying on the hypergeometric distribution).

The prop.test() function in R allows to test whether proportions are comparable between groups or does not differ from theoretical probabilities. It is referred to as a $z$-test because the test statistic looks like this:

$$ z=\frac{(f_1-f_2)}{\sqrt{\hat p \left(1-\hat p \right) \left(\frac{1}{n_1}+\frac{1}{n_2}\right)}} $$

where $\hat p=(p_1+p_2)/(n_1+n_2)$, and the indices $(1,2)$ refer to the first and second line of your table. In a two-way contingency table where $H_0:\; p_1=p_2$, this should yield comparable results to the ordinary $\chi^2$ test:

> tab <- matrix(c(100, 80, 20, 10), ncol = 2)
> chisq.test(tab)

    Pearson's Chi-squared test with Yates' continuity correction

data:  tab 
X-squared = 0.8823, df = 1, p-value = 0.3476

> prop.test(tab)

    2-sample test for equality of proportions with continuity correction

data:  tab 
X-squared = 0.8823, df = 1, p-value = 0.3476
alternative hypothesis: two.sided 
95 percent confidence interval:
 -0.15834617  0.04723506 
sample estimates:
   prop 1    prop 2 
0.8333333 0.8888889

For analysis of discrete data with R, I highly recommend R (and S-PLUS) Manual to Accompany Agresti’s Categorical Data Analysis (2002), from Laura Thompson.

Solved – Chi-squared test and binomial distribution

You don't need the Stirling approximation to compute the CDF (cumulative distribution function) of the binomial distribution. There is a relationship to the Beta distribution Wikipedia. If $X \sim \mathcal{Binom}(n,p)$ then $$ \DeclareMathOperator{\P}{\mathbb{P}} \P(X \le k)=\mathcal{I}(n-k,k+1) $$ so is expressed via the CDF of a Beta-distributed random variable (in mathematics, known as the regularized incomplete beta function.)

Best Answer

Related Solutions

Chi-Squared Test – Understanding Relationship with Test of Equal Proportions

Solved – Chi-squared test and binomial distribution

Related Question