Solved – How to compare frequencies among groups

categorical datafrequencyspss

I want to compare the frequency of a categorical value among 4 groups, what statistical test should I use? (I am using SPSS).
I am looking for a statistical test that would allow me to say: the frequency of value "V" depends on the group and the groups' frequencies are statistically different for that value.

I write here an example: Group 1 shows value "V" 10 times, Group 2 shows value "V" 15 times… Group 4 shows value "V" 40 times.

Thanks a lot in advance.

Best Answer

Results will depend on the numbers of non-v events in each group.

If data are as below, with counts of v in a and non-v in b, so that the total number of events in each group is 100, then the null hypothesis that probabilities of v are homogeneous among groups is overwhelming rejected with P-value nearly 0. The test is a chi-squared test of homogeneity. (Output from R.)

a = c(10,15,50,40)
b = c(90,85,50,60)
DTA = rbind(a,b)
DTA
  [,1] [,2] [,3] [,4]
a   10   15   50   40
b   90   85   50   60

chisq.test(DTA)

        Pearson's Chi-squared test

data:  DTA
X-squared = 54.615, df = 3, p-value = 8.296e-12

By contrast, if the counts are as shown below, so that the proportion of v events is roughly 1/3 in all groups, then a chi-squared test for homogeneity comes nowhere near rejection:

a = c(10,15,50,40)
b = c(30,50,160,125)
DTA = rbind(a,b)
DTA
  [,1] [,2] [,3] [,4]
a   10   15   50   40
b   30   50  160  125
chisq.test(DTA)

        Pearson's Chi-squared test

data:  DTA
X-squared = 0.061404, df = 3, p-value = 0.996

Such chi-squared tests find expected counts for each cell (in your case $2*4=8$ cells). Then the expected counts are compared with observed counts.

For the second test above, here are the observed and expected counts, For a valid test, all expected counts must exceed 3, and almost all should exceed 5. (No difficulty here, because the smallest expected count is almost 10.)

v.out = chisq.test(DTA)
v.out$obs
  [,1] [,2] [,3] [,4]
a   10   15   50   40
b   30   50  160  125
v.out$exp
       [,1]     [,2]     [,3]      [,4]
a  9.583333 15.57292  50.3125  39.53125
b 30.416667 49.42708 159.6875 125.46875

Notes: (a) This test of homogeneity of variances is mathematically identical to a test of indepencence of v/non-v and your categories--even though the phrasing of the interpretation of results may be different. (b) In such a chi-squared test, it is important to compare counts, not proportions. (c) Also, you must compare counts of v against counts of non-v; not counts of v against total group counts. (d) Other methods of analyzing such data are possible, but I think you will find these chi-squared tests in SPSS.

Related Question