R Confidence Interval – How to Find the Difference in Proportions Using Chi-Squared Test

chi-squared-testconfidence intervalr

I am comparing two different groups based on working status. For this instance I am looking at occupation so I have a table that looks like so:

mat1 <- matrix(c(90, 32, 9, 8, 46, 7, 8, 5), 
         4,2); dimnames(mat1) <- list(c("A"," B", "C", "D"), 
         c("Working", "Not Working"))
#   Working Not Working
# A      90          46
# B      32           7
# C       9           8
# D       8           5

I initially used the p-value of my chi-square test to determine any significant difference between the working and not working group. It has been suggested to me that I instead describe and compare the magnitudes of the differences with confidence intervals. I know that for a 2×2 table I could use prop.test to get confidence intervals, but that is not the case for the above table. Does anyone know how I would go about obtaining those confidence intervals?

Best Answer

First, let's see if there are differences in the proportion working across the four groups A, B, C, D. (Data similar to yours.)

w = c(90, 32, 9, 3)
nw = c(46, 7 , 8, 5)
TBL = rbind(w, nw)
chisq.test(TBL)

        Pearson's Chi-squared test

data:  TBL
X-squared = 8.7062, df = 3, p-value = 0.03346

Warning message:
In chisq.test(TBL) : 
 Chi-squared approximation may be incorrect

The low cell counts in groups C and D, trigger a warning message, putting the validity of the P-value into doubt. The version of 'chisq.test` implemented in R, allows for simulation of a more accurate P-value, showing a significant effect at the 5% level.

chisq.test(TBL, sim=T)$p.val
[1] 0.03098451

Significance barely at the 5% level does not invite extensive ad hoc tests. To avoid false discovery they should show significance at lower levels. Furthermore, it is not clear just which confidence intervals would be of interest. A look at the Pearson residuals to see if there groups that are strikingly different, possibly suggests comparing groups A and B. However, the level of significance there is unimpressive, especially if we protect against false discovery.

chisq.test(TBL)$resi
         [,1]      [,2]       [,3]      [,4]
w  -0.1173306  1.148334 -0.7081676 -1.019365
nw  0.1671828 -1.636247  1.0090588  1.452480

chisq.test(TBL[,c(1,2)], cor=F)

        Pearson's Chi-squared test

data:  TBL[, c(1, 2)]
X-squared = 3.6176, df = 1, p-value = 0.05717

You have already said you know how to use 'prop.test' to get a 95% confidence interval for the difference of proportions in A and B.

I don't see a point in looking at other pairs of groups---especially not, in view of the low counts there. Maybe you would like to compare group A with the other three groups combined, but 'prop.test' can handle that.

If you had additional kinds of analyses in mind using confidence intervals, please be more specific, and maybe one of us can help.

Related Question