Solved – prop.test or chi squared test on count data with 3 groups

chi-squared-testcount-dataproportion;rstatistical significance

I have some count data pertaining to number of events observed among independent trials in 3 groups of an experiment:

count   A    B    C
0       5    0    5
1       1   25    9
2       9   30   10
3       3   15    8
4       4   13    9
5       4   13    5
6       1    6    8
7       2    4    6
8       0   10    1
9       4    5    2
10      0    5    3

A Kruskal-Wallis test leads me to conclude that the distribution of numbers of events observed does not differ significantly between groups. However, I'd also like to know whether getting a specific event count is statistically more likely in the different groups. In particular I'm interested in event counts of zero or 1. For example, an event count of 1 is proportionately much more likely in group B compared with groups A or C. I'd like to test for the significance of this.

I have looked at using a chi-squared test to compare proportions between groups for a specific outcome but believe this may be invalid due to low frequencies of observation (<5) for some numbers of events. In this case, would R's prop.test be more suitable?

EDIT: post updated to make data easy to copy-and-paste

Best Answer

I can't reproduce your chi-square result. This is what I get in Stata:

          observed frequency
          expected frequency

----------------------------------
          |         which         
    count |      A       B       C
----------+-----------------------
        0 |      5       0       5
          |  1.467   5.600   2.933
          | 
        1 |      1      25       9
          |  5.133  19.600  10.267
          | 
        2 |      9      30      10
          |  7.187  27.440  14.373
          | 
        3 |      3      15       8
          |  3.813  14.560   7.627
          | 
        4 |      4      13       9
          |  3.813  14.560   7.627
          | 
        5 |      4      13       5
          |  3.227  12.320   6.453
          | 
        6 |      1       6       8
          |  2.200   8.400   4.400
          | 
        7 |      2       4       6
          |  1.760   6.720   3.520
          | 
        8 |      0      10       1
          |  1.613   6.160   3.227
          | 
        9 |      4       5       2
          |  1.613   6.160   3.227
          | 
       10 |      0       5       3
          |  1.173   4.480   2.347
----------------------------------

16 cells with expected frequency < 5

          Pearson chi2(20) =  42.0876   Pr = 0.003
 likelihood-ratio chi2(20) =  47.3624   Pr = 0.001

We can note:

  1. Strong rejection of lack of association. Plainly put, the groups really are different.

  2. Caveat: several small expected frequencies.

  3. Caveat: The chi-square test pays no attention to the ordering 0 ... 10 or what those values are. They are just 11 categories.

The distributions do look different.

enter image description here

I can't comment helpfully on the idea of focusing on 0 and 1. I am not a routine user of R and haven't looked at prop.test, but note that it is best to phrase questions in terms of statistical issues, not the software you happen to be using.