Solved – prop.test or chi squared test on count data with 3 groups

I have some count data pertaining to number of events observed among independent trials in 3 groups of an experiment:

count   A    B    C
0       5    0    5
1       1   25    9
2       9   30   10
3       3   15    8
4       4   13    9
5       4   13    5
6       1    6    8
7       2    4    6
8       0   10    1
9       4    5    2
10      0    5    3

A Kruskal-Wallis test leads me to conclude that the distribution of numbers of events observed does not differ significantly between groups. However, I'd also like to know whether getting a specific event count is statistically more likely in the different groups. In particular I'm interested in event counts of zero or 1. For example, an event count of 1 is proportionately much more likely in group B compared with groups A or C. I'd like to test for the significance of this.

I have looked at using a chi-squared test to compare proportions between groups for a specific outcome but believe this may be invalid due to low frequencies of observation (<5) for some numbers of events. In this case, would R's prop.test be more suitable?

EDIT: post updated to make data easy to copy-and-paste

observed frequency expected frequency ---------------------------------- | which count | A B C ----------+----------------------- 0 | 5 0 5 | 1.467 5.600 2.933 | 1 | 1 25 9 | 5.133 19.600 10.267 | 2 | 9 30 10 | 7.187 27.440 14.373 | 3 | 3 15 8 | 3.813 14.560 7.627 | 4 | 4 13 9 | 3.813 14.560 7.627 | 5 | 4 13 5 | 3.227 12.320 6.453 | 6 | 1 6 8 | 2.200 8.400 4.400 | 7 | 2 4 6 | 1.760 6.720 3.520 | 8 | 0 10 1 | 1.613 6.160 3.227 | 9 | 4 5 2 | 1.613 6.160 3.227 | 10 | 0 5 3 | 1.173 4.480 2.347 ---------------------------------- 16 cells with expected frequency < 5 Pearson chi2(20) = 42.0876 Pr = 0.003 likelihood-ratio chi2(20) = 47.3624 Pr = 0.001

Best Answer

I can't reproduce your chi-square result. This is what I get in Stata:

We can note:

Strong rejection of lack of association. Plainly put, the groups really are different.
Caveat: several small expected frequencies.
Caveat: The chi-square test pays no attention to the ordering 0 ... 10 or what those values are. They are just 11 categories.

The distributions do look different.

I can't comment helpfully on the idea of focusing on 0 and 1. I am not a routine user of R and haven't looked at prop.test, but note that it is best to phrase questions in terms of statistical issues, not the software you happen to be using.

Best Answer

Related Solutions

Solved – Using ANOVA and t-tests with pre-aggregated data

Solved – Which model for panel data with dependent variables from [0,1]

Related Question